Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Dino Fancellu
t get an int back. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-How-can-I-tell-if-2-nodes-are-connected-tp24926p24935.html Sent from the Apache Spark User List mailing list archive at Nabbl

Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Robineast
GraphX has a Shortest Paths algorithm implementation which will tell you, for all vertices in the graph, the shortest distance to a specific ('landmark') vertex. The returned value is '/a graph where each vertex attribute is a map containing the shortest-path distance to each rea

Re: GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Dino Fancellu
e.g. http://gremlindocs.spmallette.documentup.com/#finding-edges-between-vertices -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-How-can-I-tell-if-2-nodes-are-connected-tp24926p24929.html Sent from the Apache Spark User List mailing list archive

GraphX: How can I tell if 2 nodes are connected?

2015-10-05 Thread Dino Fancellu
ist.1001560.n3.nabble.com/GraphX-How-can-I-tell-if-2-nodes-are-connected-tp24926.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional co

Graphx hangs and crashes on EdgeRDD creation

2015-10-05 Thread William Saar
Hi, I am trying to run a GraphX job on 20 million edges with Spark 1.5.1, but the job seems to hang for 30 minutes on a single executor when creating the graph and eventually crashes with "IllegalArgumentException: Size exceeds Integer.MAX_VALUE" I suspect this is because of pa

Re: GraphX create graph with multiple node attributes

2015-09-26 Thread Nick Peterson
ices' object and the 'edges' object. Do > you think this is what is causing the issue? > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24832.html >

Re: GraphX create graph with multiple node attributes

2015-09-26 Thread JJ
ges' object. Do you think this is what is causing the issue? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24832.html Sent from the Apache Spark User List mailing list a

Re: GraphX create graph with multiple node attributes

2015-09-26 Thread Robineast
to be dropping off. Could you show your full code. g.degrees.count gives 2 - as the scaladocs mention 'The degree of each vertex in the graph. @note Vertices with no edges are not returned in the resulting RDD' - Robin East Spark GraphX in Action Michael Malak and Robin East Manni

Re: GraphX create graph with multiple node attributes

2015-09-26 Thread JJ
Robineast wrote > 2) let GraphX supply a null instead > val graph = Graph(vertices, edges) // vertices found in 'edges' but > not in 'vertices' will be set to null Thank you! This method works. As a follow up (sorry I'm new to this, don't know if I

Re: GraphX create graph with multiple node attributes

2015-09-26 Thread Robineast
s sense 2) let GraphX supply a null instead val graph = Graph(vertices, edges) // vertices found in 'edges' but not in 'vertices' will be set to null ------- Robin East Spark GraphX in Action M

GraphX create graph with multiple node attributes

2015-09-25 Thread JJ
Hi, I am new to Spark and GraphX, so thanks in advance for your patience. I want to create a graph with multiple node attributes. Here is my code: But I receive error: Can someone help? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX

RE: No space left on device when running graphx job

2015-09-24 Thread Jack Yang
Hi all, I resolved the problems. Thanks folk. Jack From: Jack Yang [mailto:j...@uow.edu.au] Sent: Friday, 25 September 2015 9:57 AM To: Ted Yu; Andy Huang Cc: user@spark.apache.org Subject: RE: No space left on device when running graphx job Also, please see the screenshot below from spark web

RE: No space left on device when running graphx job

2015-09-24 Thread Jack Yang
Subject: RE: No space left on device when running graphx job Hi, here is the full stack trace: 15/09/25 09:50:14 WARN scheduler.TaskSetManager: Lost task 21088.0 in stage 6.0 (TID 62230, 192.168.70.129): java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes

RE: No space left on device when running graphx job

2015-09-24 Thread Jack Yang
device when running graphx job Andy: Can you show complete stack trace ? Have you checked there are enough free inode on the .129 machine ? Cheers On Sep 23, 2015, at 11:43 PM, Andy Huang mailto:andy.hu...@servian.com.au>> wrote: Hi Jack, Are you writing out to disk? Or it sounds like Sp

Re: No space left on device when running graphx job

2015-09-24 Thread Ted Yu
) and it's running out of disk space. > > Cheers > Andy > >> On Thu, Sep 24, 2015 at 4:29 PM, Jack Yang wrote: >> Hi folk, >> >> >> >> I have an issue of graphx. (spark: 1.4.0 + 4 machines + 4G memory + 4 CPU >> cores) >> >&

Re: No space left on device when running graphx job

2015-09-23 Thread Andy Huang
Hi Jack, Are you writing out to disk? Or it sounds like Spark is spilling to disk (RAM filled up) and it's running out of disk space. Cheers Andy On Thu, Sep 24, 2015 at 4:29 PM, Jack Yang wrote: > Hi folk, > > > > I have an issue of graphx. (spark: 1.4.0 + 4 machines

No space left on device when running graphx job

2015-09-23 Thread Jack Yang
Hi folk, I have an issue of graphx. (spark: 1.4.0 + 4 machines + 4G memory + 4 CPU cores) Basically, I load data using GraphLoader.edgeListFile mthod and then count number of nodes using: graph.vertices.count() method. The problem is : Lost task 11972.0 in stage 6.0 (TID 54585, 192.168.70.129

Uneven distribution of tasks among workers in Spark/GraphX 1.5.0

2015-09-22 Thread dmytro
even-distribution-of-tasks-among-workers-in-Spark-GraphX-1-5-0-tp24763.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional comm

GraphX to work with Streaming

2015-09-18 Thread Rohit Kumar
Hi I am having a setup where I have the edges coming as a stream I want to create a graph in GraphX which updates its structure after new edge comes. Is there any way to do this using spark streaming and graphx? Regards Rohit

GraphX, graph clustering, pattern matching

2015-09-15 Thread Alex Karargyris
I am new to Spark and I was wondering if anyone would help me on pointing me to the right direction: Are there any algorithms/tutorials available on Spark's GraphX for graph clustering and pattern matching? More specifically I am interested in: a) querying a small graph against a larger grap

Re: Graphx CompactBuffer help

2015-08-28 Thread Robineast
<> This should work: coon.filter(x => x.exists(el => Seq(1,15).contains(el))) CompactBuffer is a specialised form of a Scala Iterator --- Robin East Spark GraphX in Action Michael Malak and Robin

Graphx CompactBuffer help

2015-08-27 Thread smagadi
get that compactbuffers which has a values say 1 and 15 .How can i get that ? appreciate the help -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-CompactBuffer-help-tp24481.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: use GraphX with Spark Streaming

2015-08-25 Thread ponkin
t.1001560.n3.nabble.com/use-GraphX-with-Spark-Streaming-tp24418p24451.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-

NaN in GraphX PageRank answer

2015-08-18 Thread Khaled Ammar
Hi all, I was trying to use GraphX to compute pagerank and found that pagerank value for several vertices is NaN. I am using Spark 1.3. Any idea how to fix that? -- Thanks, -Khaled

Fwd: Graphx - how to add vertices to a HashSet of vertices ?

2015-08-14 Thread Ranjana Rajendran
-- Forwarded message -- From: Ranjana Rajendran Date: Thu, Aug 13, 2015 at 7:37 AM Subject: Graphx - how to add vertices to a HashSet of vertices ? To: d...@spark.apache.org Hi, sampledVertices is a HashSet of vertices var sampledVertices: HashSet[VertexId] = HashSet

Re: graphx class not found error

2015-08-13 Thread Ted Yu
works in local mode) > > > I get the following error: > > > > any help appreciated > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/graphx-class-not-found-error-tp24253.html > Sent from

Re: graphx class not found error

2015-08-13 Thread dizzy5112
Oh forgot to note using the Scala REPL for this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/graphx-class-not-found-error-tp24253p24254.html Sent from the Apache Spark User List mailing list archive at Nabble.com

graphx class not found error

2015-08-13 Thread dizzy5112
the code below works perfectly on both cluster and local modes but when i try to create a graph in cluster mode (it works in local mode) I get the following error: any help appreciated -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/graphx-class

pregel graphx job not finishing

2015-08-11 Thread dizzy5112
Hi im currently using a pregel message passing function for my graph in spark and graphx. The problem i have is that the code runs perfectly on spark 1.0 and finishes in a couple of minutes but as we have upgraded now im trying to run the same code on 1.3 but it doesnt finish (left it overnight

AW: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-08-11 Thread rene.pfitzner
Hi – I'd like to follow up on this, as I am running into very similar issues (with a much bigger data set, though – 10^5 nodes, 10^7 edges). So let me repost the question: Any ideas on how to estimate graphx memory requirements? Cheers! Von: Roman Sokolov [mailto:ole...@gmail.com] Ges

Re: SparkR -Graphx Connected components

2015-08-11 Thread Robineast
respectively. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Graphx-Connected-components-tp24165p24209.html Sent from the Apache Spark User List mailing list archive at Nabble.com

SparkR -Graphx Cliques

2015-08-09 Thread smagadi
How to find the cliques using spark graphx ? a quick code snippet is appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Graphx-Cliques-tp24191.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: SparkR -Graphx Connected components

2015-08-09 Thread smagadi
een 6,0 (3,3)-OK (7,7)-This shd have been 7,3 (5,3)-OK (2,0)-OK -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Graphx-Connected-components-tp24165p24190.html Sent from the Apache Spark User List mailing

Re: SparkR -Graphx Connected components

2015-08-07 Thread Robineast
are 1 and 2 (the lowest vertices in each component). So vertices 1 and 3 will have vertex data = 1 and vertices 2,4,5 and 6 will have vertex data 2. Robin --- Robin East Spark GraphX in Action Michael Malak and Robin East

SparkR -Graphx Connected components

2015-08-07 Thread smagadi
Id, Int]= graph.stronglyConnectedComponents(10). help needed in completing the code.I do not know from now on how to get stronglyconnected nodes .Pls help in completing this code/ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Graphx-Conn

Re: SparkR -Graphx

2015-08-06 Thread Shivaram Venkataraman
+Xiangrui I am not sure exposing the entire GraphX API would make sense as it contains a lot of low level functions. However we could expose some high level functions like PageRank etc. Xiangrui, who has been working on similar techniques to expose MLLib functions like GLM might have more to add

SparkR -Graphx

2015-08-06 Thread smagadi
Wanted to use the GRaphX from SparkR , is there a way to do it ?.I think as of now it is not possible.I was thinking if one can write a wrapper in R that can call Scala Graphx libraries . Any thought on this please. -- View this message in context: http://apache-spark-user-list.1001560.n3

Implementing algorithms in GraphX pregel

2015-08-05 Thread Krish
Hi, I was recently looking into spark graphx as one of the frameworks that can help me solve some graph related problems. The 'think-like-a-vertex' paradigm is something new to me and I cannot wrap my head over how to implement simple algorithms like Depth First or Breadth First or ev

looking for helps in using graphx aggregateMessages

2015-07-31 Thread man june
Dear list, Hi~I am new to spark and graphx, and I have a few experiences using scala. I want to use graphx to calculate some basic statistics in linked open data, which is basically a graph.  Suppose the graph only contains one type of edge, directing from individuals to concepts, and the edge

Re: assertion failed error with GraphX

2015-07-22 Thread Roman Sokolov
? So now I am trying to understand how it works and rewrite it maybe. I would like to process big graphs with not so much RAM on each machine. Am 20.07.2015 04:27 schrieb "Jack Yang" : > Hi there, > > > > I got an error when running one simple graphX program. > > My

assertion failed error with GraphX

2015-07-19 Thread Jack Yang
Hi there, I got an error when running one simple graphX program. My setting is: spark 1.4.0, Hadoop yarn 2.5. scala 2.10. with four virtual machines. if I constructed one small graph (6 nodes, 4 edges), I run: println("triangleCount: %s ".format( hdfs_graph.triangleCount().vert

[SPARK][GRAPHX] 'Executor Deserialize Time' is too big

2015-07-16 Thread Hlib Mykhailenko
Hello, I use Apache GraphX (version 1.1.0). And sometime stage which corresponds to this line of code: val graph = GraphLoader.edgeListFile(...) takes too much time. Looking to EVENT_LOG_1 file I found out that for some tasks of this stage 'Executor Deserialize Time' were too

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-07-10 Thread Roman Sokolov
newgraph". Next will test it on > bigger (several Gb) networks. > > I am using Spark 1.3 and 1.4 but haven't seen this function in > https://spark.apache.org/docs/latest/graphx-programming-guide.html > > Thanks a lot guys! > Am 26.06.2015 13:50 schrieb "Ted Yu&quo

Re: GraphX Synth Benchmark

2015-07-09 Thread Khaled Ammar
e server is used more than others. > > Please help ASAP. > > Thank you > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n23747/13.png> > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/

GraphX Synth Benchmark

2015-07-09 Thread AshutoshRaghuvanshi
le/n23747/13.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Synth-Benchmark-tp23747.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubsc

Is it now possible to incrementally update a graph in GraphX

2015-07-07 Thread Hellen
I found this post back in March 2014. http://apache-spark-user-list.1001560.n3.nabble.com/Incrementally-add-remove-vertices-in-GraphX-td2227.html I was wondering if there is any progress on GraphX Streaming/incremental graph update in GraphX. Or is there a place where I can track the progress on

Question about master memory requirement and GraphX pagerank performance !

2015-07-07 Thread Khaled Ammar
Hi all, I am fairly new to spark and wonder if you can help me. I am exploring GraphX/Spark by running the pagerank example on a medium size graph (12 GB) using this command: My cluster is 1+16 machines, the master has 15 GB memory and each worker has 30 GB. The master has 2 cores and each

Re: Spark got stuck with BlockManager after computing connected components using GraphX

2015-07-05 Thread Akhil Das
finally tried typing the variable name, which actually worked. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-got-stuck-with-BlockManager-after-computing-connected-components-using-

Re: Spark got stuck with BlockManager after computing connected components using GraphX

2015-07-05 Thread Hellen
ly worked. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-got-stuck-with-BlockManager-after-computing-connected-components-using-GraphX-tp23620p23623.html Sent from the Apache Spark User L

Spark got stuck with BlockManager after computing connected components using GraphX

2015-07-04 Thread Hellen
I'm computing connected components using Spark GraphX on AWS EC2. I believe the computation was successful, as I saw the type information of the final result. However, it looks like Spark was doing some cleanup. The BlockManager removed a bunch of blocks and stuck at 15/07/04 21:53:06

Dataframes to EdgeRDD (GraphX) using Scala api to Spark

2015-06-30 Thread zblanton
e-spark-user-list.1001560.n3.nabble.com/Dataframes-to-EdgeRDD-GraphX-using-Scala-api-to-Spark-tp23548.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.

Re: GraphX - ConnectedComponents (Pregel) - longer and longer interval between jobs

2015-06-29 Thread Thomas Gerber
gt; Note that this problem is probably NOT caused directly by GraphX, but > GraphX reveals it because as you go further down the iterations, you get > further and further away of a shuffle you can rely on. > > On Thu, Jun 25, 2015 at 7:43 PM, Thomas Gerber > wrote: > >> Hello, &g

Re: GraphX - ConnectedComponents (Pregel) - longer and longer interval between jobs

2015-06-26 Thread Thomas Gerber
Note that this problem is probably NOT caused directly by GraphX, but GraphX reveals it because as you go further down the iterations, you get further and further away of a shuffle you can rely on. On Thu, Jun 25, 2015 at 7:43 PM, Thomas Gerber wrote: > Hello, > > We r

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Roman Sokolov
1.4 but haven't seen this function in https://spark.apache.org/docs/latest/graphx-programming-guide.html Thanks a lot guys! Am 26.06.2015 13:50 schrieb "Ted Yu" : > See SPARK-4917 which went into Spark 1.3.0 > > On Fri, Jun 26, 2015 at 2:27 AM, Robin East > wrote:

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Ted Yu
((dblCount & 1) == 0) >> >> Cheers >> >> On Thu, Jun 25, 2015 at 6:20 AM, Roman Sokolov wrote: >> >>> Hello! >>> I am trying to compute number of triangles with GraphX. But get memory >>> error or heap size, even though the dataset is very sm

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Robin East
(vid, _, optCounter: Option[Int]) => > val dblCount = optCounter.getOrElse(0) > // double count should be even (divisible by two) > assert((dblCount & 1) == 0) > > Cheers > > On Thu, Jun 25, 2015 at 6:20 AM, Roman Sokolov <mailto:ole...@gmail

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-26 Thread Roman Sokolov
15 at 6:20 AM, Roman Sokolov wrote: > >> Hello! >> I am trying to compute number of triangles with GraphX. But get memory >> error or heap size, even though the dataset is very small (1Gb). I run the >> code in spark-shell, having 16Gb RAM machine (also tried with 2 worker

Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-25 Thread Ted Yu
& 1) == 0) Cheers On Thu, Jun 25, 2015 at 6:20 AM, Roman Sokolov wrote: > Hello! > I am trying to compute number of triangles with GraphX. But get memory > error or heap size, even though the dataset is very small (1Gb). I run the > code in spark-shell, having 16Gb RAM machine

Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-25 Thread Roman Sokolov
Hello! I am trying to compute number of triangles with GraphX. But get memory error or heap size, even though the dataset is very small (1Gb). I run the code in spark-shell, having 16Gb RAM machine (also tried with 2 workers on separate machines 8Gb RAM each). So I have 15x more memory than the

RE: Machine Learning on GraphX

2015-06-18 Thread Evo Eftimov
What is GraphX: - It can be viewed as a kind of Distributed, Parallel, Graph Database - It can be viewed as Graph Data Structure (Data Structures 101 from your CS course) - It features some off the shelve algos for Graph Processing and Navigation (Algos and Data

Re: Machine Learning on GraphX

2015-06-18 Thread andy petrella
; Thanks for the quick answer. > I've already followed this tutorial but it doesn't use GraphX at all. My > goal would be to work directly on the graph, and not extracting edges and > vertices from the graph as standard RDDs and then work on that with the > standard MLli

Re: Machine Learning on GraphX

2015-06-18 Thread Timothée Rebours
Thanks for the quick answer. I've already followed this tutorial but it doesn't use GraphX at all. My goal would be to work directly on the graph, and not extracting edges and vertices from the graph as standard RDDs and then work on that with the standard MLlib's ALS, which

Re: Machine Learning on GraphX

2015-06-18 Thread Akhil Das
This might give you a good start http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html its a bit old though. Thanks Best Regards On Thu, Jun 18, 2015 at 2:33 PM, texol wrote: > Hi, > > I'm new to GraphX and I'd like to use Machine Learning a

Machine Learning on GraphX

2015-06-18 Thread texol
Hi, I'm new to GraphX and I'd like to use Machine Learning algorithms on top of it. I wanted to write a simple program implementing MLlib's ALS on a bipartite graph (a simple movie recommendation), but didn't succeed. I found an implementation on Spark 1.1.x (https://github

Re: in GraphX,program with Pregel runs slower and slower after several iterations

2015-06-03 Thread Cheuk Lam
pEdges{edge => edge.attr} clonedGraph.checkpoint graph = clonedGraph -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/in-GraphX-program-with-Pregel-runs-slower-and-slower-after-several-iterations-tp23121p23133.html Sent from the Apache S

Re: in GraphX,program with Pregel runs slower and slower after several iterations

2015-06-02 Thread Cheuk Lam
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/in-GraphX-program-with-Pregel-runs-slower-and-slower-after-several-iterations-tp23121p23122.html Sent from the Apache Spark User List mailing list archive at

Re: Incrementally add/remove vertices in GraphX

2015-05-20 Thread vzaychik
Any updates on GraphX Streaming? There was mention of this about a year ago, but nothing much since. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Incrementally-add-remove-vertices-in-GraphX-tp2227p22963.html Sent from the Apache Spark User List

Re: Effecient way to fetch all records on a particular node/partition in GraphX

2015-05-17 Thread Ankur Dave
If you know the partition IDs, you can launch a job that runs tasks on only those partitions by calling sc.runJob . For example, we do this in IndexedRDD

Re: Data partitioning and node tracking in Spark-GraphX

2015-05-17 Thread MUHAMMAD AAMIR
2 PM > > *To:* Evo Eftimov > *Cc:* user@spark.apache.org > *Subject:* Re: Data partitioning and node tracking in Spark-GraphX > > > > Thanks a lot for the reply. Indeed it is useful but to be more precise i > have 3D data and want to index it using octree. Thus i aim to bu

Effecient way to fetch all records on a particular node/partition in GraphX

2015-05-17 Thread mas
way to do that ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Effecient-way-to-fetch-all-records-on-a-particular-node-partition-in-GraphX-tp22923.html Sent from the Apache Spark User List mailing list archive at

Re: How does GraphX stores the routing table?

2015-04-22 Thread MUHAMMAD AAMIR
Hi Ankur, Thanks for the answer. However i still have following queries. On Wed, Apr 22, 2015 at 8:39 AM, Ankur Dave wrote: > On Tue, Apr 21, 2015 at 10:39 AM, mas wrote: > >> How does GraphX stores the routing table? Is it stored on the master node >> or >> chunks of t

Re: How does GraphX stores the routing table?

2015-04-21 Thread Ankur Dave
On Tue, Apr 21, 2015 at 10:39 AM, mas wrote: > How does GraphX stores the routing table? Is it stored on the master node > or > chunks of the routing table is send to each partition that maintains the > record of vertices and edges at that node? > The latter: the routing

How does GraphX stores the routing table?

2015-04-21 Thread mas
Hi, How does GraphX stores the routing table? Is it stored on the master node or chunks of the routing table is send to each partition that maintains the record of vertices and edges at that node? If only customized edge partitioning is performed will the corresponding vertices be sent to same

GraphX: unbalanced computation and slow runtime on livejournal network

2015-04-19 Thread Steven Harenberg
Hi all, I have been testing GraphX on the soc-LiveJournal1 network from the SNAP repository. Currently I am running on c3.8xlarge EC2 instances on Amazon. These instances have 32 cores and 60GB RAM per node, and so far I have run SSSP, PageRank, and WCC on a 1, 4, and 8 node cluster. The issues

Re: GraphX: unbalanced computation and slow runtime on livejournal network

2015-04-19 Thread hnahak
Hi Steve i did spark 1.3.0 page rank bench-marking on soc-LiveJournal1 in 4 node cluster. 16,16,8,8 Gbs ram respectively. Cluster have 4 worker including master with 4,4,2,2 CPUs I set executor memroy to 3g and driver to 5g. No. of Iterations --> GraphX(mins) 1 -->

GraphX: unbalanced computation and slow runtime on livejournal network

2015-04-19 Thread harenbergsd
Hi all, I have been testing GraphX on the soc-LiveJournal1 network from the SNAP repository. Currently I am running on c3.8xlarge EC2 instances on Amazon. These instances have 32 cores and 60GB RAM per node, and so far I have run SSSP, PageRank, and WCC on a 1, 4, and 8 node cluster. The issues

Data frames in GraphX

2015-04-19 Thread hnahak
To Spark-admin, I like the data frames in 1.3 version, is there any plan to integrate this with Graphx in 1.4 or later ? currently I have huge information in vertex property, if I can use data frames to hold the properties instead of VerexRDD, that will help me a lot. -- View this

RE: Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread Evo Eftimov
, April 16, 2015 4:32 PM To: Evo Eftimov Cc: user@spark.apache.org Subject: Re: Data partitioning and node tracking in Spark-GraphX Thanks a lot for the reply. Indeed it is useful but to be more precise i have 3D data and want to index it using octree. Thus i aim to build a two level indexing

Re: Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread MUHAMMAD AAMIR
> > Evo Eftimov > > > > *From:* MUHAMMAD AAMIR [mailto:mas.ha...@gmail.com] > *Sent:* Thursday, April 16, 2015 4:20 PM > *To:* Evo Eftimov > *Cc:* user@spark.apache.org > *Subject:* Re: Data partitioning and node tracking in Spark-GraphX > > > > I want to use Spark

RE: Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread Evo Eftimov
tracking in Spark-GraphX I want to use Spark functions/APIs to do this task. My basic purpose is to index the data and divide and send it to multiple nodes. Then at the time of accessing i want to reach the right node and data partition. I don't have any clue how to do this. Thanks,

Re: Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread MUHAMMAD AAMIR
ser@spark.apache.org > Subject: Data partitioning and node tracking in Spark-GraphX > > I have a big data file, i aim to create index on the data. I want to > partition the data based on user defined function in Spark-GraphX (Scala). > Further i want to keep track the node on which a particula

RE: Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread Evo Eftimov
racking in Spark-GraphX I have a big data file, i aim to create index on the data. I want to partition the data based on user defined function in Spark-GraphX (Scala). Further i want to keep track the node on which a particular data partition is send and being processed so i could fetch the requir

Data partitioning and node tracking in Spark-GraphX

2015-04-16 Thread mas
I have a big data file, i aim to create index on the data. I want to partition the data based on user defined function in Spark-GraphX (Scala). Further i want to keep track the node on which a particular data partition is send and being processed so i could fetch the required data by accessing

Re: [GraphX] aggregateMessages with active set

2015-04-13 Thread James
into the memory? [1] https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L237-266 Alcaid 2015-04-10 2:47 GMT+08:00 Ankur Dave : > Actually, GraphX doesn't need to scan all the edges, because it > maintains a clustered index on th

Manning looking for a co-author for the GraphX in Action book

2015-04-13 Thread Reynold Xin
Hi all, Manning (the publisher) is looking for a co-author for the GraphX in Action book. The book currently has one author (Michael Malak), but they are looking for a co-author to work closely with Michael and improve the writings and make it more consumable. Early access page for the book

Re: [GraphX] aggregateMessages with active set

2015-04-09 Thread Ankur Dave
Actually, GraphX doesn't need to scan all the edges, because it maintains a clustered index on the source vertex id (that is, it sorts the edges by source vertex id and stores the offsets in a hash table). If the activeDirection is appropriately set, it can then jump only to the clusters

Re: [GraphX] aggregateMessages with active set

2015-04-09 Thread James
].aggregateMessagesWithActiveSet(...) > Ankur > > > On Tue, Apr 7, 2015 at 2:56 AM, James wrote: > > Hello, > > > > The old api of GraphX "mapReduceTriplets" has an optional parameter > > "activeSetOpt: Option[(VertexRDD[_]" that limit the input o

Re: [GraphX] aggregateMessages with active set

2015-04-07 Thread Ankur Dave
access it publicly via GraphImpl, though the API isn't guaranteed to be stable: graph.asInstanceOf[GraphImpl[VD,ED]].aggregateMessagesWithActiveSet(...) Ankur On Tue, Apr 7, 2015 at 2:56 AM, James wrote: > Hello, > > The old api of GraphX "mapReduceTriplets" h

[GraphX] aggregateMessages with active set

2015-04-07 Thread James
Hello, The old api of GraphX "mapReduceTriplets" has an optional parameter "activeSetOpt: Option[(VertexRDD[_]" that limit the input of sendMessage. However, to the new api "aggregateMessages" I could not find this option, why it does not offer any more? Alcaid

graphx running time

2015-04-06 Thread daze5112
Hi im currently using graphx for some analysis and have come into a bit of a hurdle. If use my test dataset of 20 nodes and about 30 links it runs really quickly. I have two other data sets i use one of 10million links and one of 20 million. When i create my graphs seems to work okay and i can get

Re: Quick GraphX gutcheck

2015-04-01 Thread Takeshi Yamamuro
h.vertices with a transformed RDD > where the keys are vertexIds from the original graph, correct? > > --John > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Quick-GraphX-gutcheck-tp22345.html > Sent from the Apache

Quick GraphX gutcheck

2015-04-01 Thread hokiegeek2
-list.1001560.n3.nabble.com/Quick-GraphX-gutcheck-tp22345.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?

2015-03-30 Thread Franc Carter
One issue is that 'big' becomes 'not so big' reasonably quickly. A couple of TeraBytes is not that challenging (depending on the algorithm) these days where as 5 years ago it was a big challenge. We have a bit over a PetaByte (not using Spark) and using a distributed system is the only viable way

Re: Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?

2015-03-30 Thread Steve Loughran
On 30 Mar 2015, at 13:27, jay vyas mailto:jayunit100.apa...@gmail.com>> wrote: Just the same as spark was disrupting the hadoop ecosystem by changing the assumption that "you can't rely on memory in distributed analytics"...now maybe we are challenging the assumption that "big data analytics

Re: Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?

2015-03-30 Thread jay vyas
is ability to run a specialized set > of common algorithms in "fast-local-mode" just like a compiler optimizer > can decide to inline some methods, or rewrite a recursive function as a for > loop if it's in tail position, I would say that the future of GraphX can be >

Re: Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?

2015-03-30 Thread Steve Loughran
" just like a compiler optimizer can decide to inline some methods, or rewrite a recursive function as a for loop if it's in tail position, I would say that the future of GraphX can be that if a certain algorithm is a well known one (e.g. shortest paths) and can be run locally faster

Pregel API Abstraction for GraphX

2015-03-29 Thread Kenny Bastani
Hi all, I have been working hard to make it easier for developers to make community contributions to the Spark GraphX algorithm library. At the core of this I found that the Pregel API is a difficult concept to understand and I think I can help make it better. Can you please review https

Re: Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?

2015-03-29 Thread Eran Medan
to inline some methods, or rewrite a recursive function as a for loop if it's in tail position, I would say that the future of GraphX can be that if a certain algorithm is a well known one (e.g. shortest paths) and can be run locally faster than on a distributed set (taking into account bringi

Custom edge partitioning in graphX

2015-03-28 Thread arpp
Hi all, I am working with spark 1.0.0. mainly for the usage of GraphX and wished to apply some custom partitioning strategies on the edge list of the graph. I have generated an edge list file which has the partition number after the source and destination id in each line. Initially I am loading

Re: Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?

2015-03-27 Thread Sean Owen
(I bet the Spark implementation could be improved. I bet GraphX could be optimized.) Not sure about this one, but "in core" benchmarks often start by assuming that the data is local. In the real world, data is unlikely to be. The benchmark has to include the cost of bringing all the d

<    1   2   3   4   5   6   7   >