Re: Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?

2015-03-27 Thread Jörn Franke
machine (for some problems single thread might be the fastest solution anyway)? Based on my experience a single machine can be already quiet useful for graph algorithms. There are also different graph systems all for different purposes. Spark Graphx is more general (can be used in combination with the

Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?

2015-03-27 Thread Eran Medan
Remember that article that went viral on HN? (Where a guy showed how GraphX / Giraph / GraphLab / Spark have worse performance on a 128 cluster than on a 1 thread machine? if not here is the article - http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html) Well as you may

Populating a HashMap from a GraphX connectedComponents graph

2015-03-26 Thread Bob DuCharme
The Scala code below was based on https://www.sics.se/~amir/files/download/dic/answers6.pdf. I extended it by adding a HashMap called componentLists that I populated with each component's starting node as the key and then a ListBuffer of the component's members. As the output below the code sho

Re: Graphx gets slower as the iteration number increases

2015-03-24 Thread Ankur Dave
, 2015 at 7:12 PM, orangepri...@foxmail.com < orangepri...@foxmail.com> wrote: > I'm working with graphx to calculate the pageranks of an extreme large > social network with billion verteces. > As iteration number increases, the speed of each iteration becomes slower > a

Graphx gets slower as the iteration number increases

2015-03-24 Thread orangepri...@foxmail.com
I'm working with graphx to calculate the pageranks of an extreme large social network with billion verteces. As iteration number increases, the speed of each iteration becomes slower and unacceptable. Is there any reason of it? How can I accelerate the ineration process? oran

Spark GraphX In Action on documentation page?

2015-03-24 Thread Michael Malak
Can my new book, Spark GraphX In Action, which is currently in MEAP http://manning.com/malak/, be added to https://spark.apache.org/documentation.html and, if appropriate, to https://spark.apache.org/graphx/ ? Michael Malak

GraphX Pregal optimization

2015-03-23 Thread Clare Huang
Hi all, I have been testing to use Spark Graphx to do large sparse matrix multiplication for 3D image reconstruction. I used pregal API to forward and back project the images based on a graph respresentation of a large sparse matrix. I was wondering how one can optimize the Pregal operation

Re: GraphX: Get edges for a vertex

2015-03-18 Thread Jeffrey Jedele
Hi Mas, I never actually worked with GraphX, but one idea: As far as I know, you can directly access the vertex and edge RDDs of your Graph object. Why not simply run a .filter() on the edge RDD to get all edges that originate from or end at your vertex? Regards, Jeff 2015-03-18 10:52 GMT+01:00

Re: GraphX: Get edges for a vertex

2015-03-18 Thread mas
context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Get-edges-for-a-vertex-tp18880p22115.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: IllegalAccessError in GraphX (Spark 1.3.0 LDA)

2015-03-17 Thread Jeffrey Jedele
Hi Xiangrui, thank you a lot for the hint! I just tried on another machine with a clean project and there it worked like a charm. Will retry on the other machine tomorrow. Regards, Jeff 2015-03-17 19:57 GMT+01:00 Xiangrui Meng : > Please check your classpath and make sure you don't have multipl

Re: IllegalAccessError in GraphX (Spark 1.3.0 LDA)

2015-03-17 Thread Xiangrui Meng
Please check your classpath and make sure you don't have multiple Spark versions deployed. If the classpath looks correct, please create a JIRA for this issue. Thanks! -Xiangrui On Tue, Mar 17, 2015 at 2:03 AM, Jeffrey Jedele wrote: > Hi all, > I'm trying to use the new LDA in mllib, but when try

GraphX - Correct path traversal order from an Array[Edge[ED]]

2015-03-17 Thread bertlhf
tId = 9 SrcId = 9, DstId = 10 SrcId = 10, DstId = 11 SrcId = 11, DstId = 12 ... SrcId = 14, DstId = 15 SrcId = 15, DstId = 16 SrcId = 16, DstId = 98 SrcId = 98, DstId = 99 ... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Correct-path-traversal-order-

IllegalAccessError in GraphX (Spark 1.3.0 LDA)

2015-03-17 Thread Jeffrey Jedele
Hi all, I'm trying to use the new LDA in mllib, but when trying to train the model, I'm getting following error: java.lang.IllegalAccessError: tried to access class org.apache.spark.util.collection.Sorter from class org.apache.spark.graphx.impl.EdgePartitionBuilder at org.apache.spark.graphx.i

Re: Basic GraphX deployment and usage question

2015-03-16 Thread Takeshi Yamamuro
Hi, Your're right, that is, graphx has already be included in a spark default package. As a first step, 'Analytics' seems to be suitable for your objective. # ./bin/run-example graphx.Analytics pagerank On Tue, Mar 17, 2015 at 2:21 AM, Khaled Ammar wrote: > Hi, > &g

Basic GraphX deployment and usage question

2015-03-16 Thread Khaled Ammar
Hi, I'm very new to Spark and GraphX. I downloaded and configured Spark on a cluster, which uses Hadoop 1.x. The master UI shows all workers. The example command "run-example SparkPi" works fine and completes successfully. I'm interested in GraphX. Although the documentation

Null Pointer Exception due to mapVertices function in GraphX

2015-03-15 Thread James
I have got NullPointerException in aggregateMessages on a graph which is the output of mapVertices function of a graph. I found the problem is because of the mapVertices funciton did not affect all the triplet of the graph. // Initial the graph, assign a counter to each vertex that contains the ve

Re: [GRAPHX] could not process graph with 230M edges

2015-03-14 Thread Takeshi Yamamuro
Hi, If you have heap problems in spark/graphx, it'd be better to split partitions into smaller ones so as to fit the partition on memory. On Sat, Mar 14, 2015 at 12:09 AM, Hlib Mykhailenko < hlib.mykhaile...@inria.fr> wrote: > Hello, > > I cannot process graph with 23

Re: GraphX Snapshot Partitioning

2015-03-14 Thread Takeshi Yamamuro
Large edge partitions could cause java.lang.OutOfMemoryError, and then spark tasks fails. FWIW, each edge partition can have at most 2^32 edges because 64-bit vertex IDs are mapped into 32-bit ones in each partitions. If #edges is over the limit, graphx could throw ArrayIndexOutOfBoundsException

[GRAPHX] could not process graph with 230M edges

2015-03-13 Thread Hlib Mykhailenko
Hello, I cannot process graph with 230M edges. I cloned apache.spark, build it and then tried it on cluster. I used Spark Standalone Cluster: -5 machines (each has 12 cores/32GB RAM) -'spark.executor.memory' == 25g -'spark.driver.memory' == 3g Graph has 231359027 edges. And its file weig

Re: GraphX Snapshot Partitioning

2015-03-11 Thread Matthew Bucci
> the snapshots we had was too large to fit into a single partition. Would >> the >> snapshot be split over the two partitions equally, for example, and how >> is a >> single snapshot spread over multiple partitions? >> >> Thank You, >> Matthew Bucci >&

Re: GraphX Snapshot Partitioning

2015-03-09 Thread Takeshi Yamamuro
artition. Would > the > snapshot be split over the two partitions equally, for example, and how is > a > single snapshot spread over multiple partitions? > > Thank You, > Matthew Bucci > > > > -- > View this message in context: > http://apache-spark-user-list.1001

GraphX Snapshot Partitioning

2015-03-09 Thread Matthew Bucci
how is a single snapshot spread over multiple partitions? Thank You, Matthew Bucci -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Snapshot-Partitioning-tp21977.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: GraphX path traversal

2015-03-04 Thread Robin East
Actually your Pregel code works for me: import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD val vertexlist = Array((1L,"One"), (2L,"Two"), (3L,"Three"), (4L,"Four"),(5L,"Five"),(6L,"Six")) val edgelist = Array(Edge(6,5,"6 to 5"),Edge(5,4,"5 to 4"),Edge(4,3,

Re: GraphX path traversal

2015-03-03 Thread Robin East
Have you tried EdgeDirection.In? > On 3 Mar 2015, at 16:32, Robin East wrote: > > What about the following which can be run in spark shell: > > import org.apache.spark._ > import org.apache.spark.graphx._ > import org.apache.spark.rdd.RDD > > val vertexlist = Array((1L,"One"), (2L,"Two"), (3L,"

Re: GraphX path traversal

2015-03-03 Thread Madabhattula Rajesh Kumar
Hi, I have tried below program using pergel API but I'm not able to get my required output. I'm getting exactly reverse output which I'm expecting. // Creating graph using above mail mentioned edgefile val graph: Graph[Int, Int] = GraphLoader.edgeListFile(sc, "/home/rajesh/Downloads/graphdata/da

Re: GraphX path traversal

2015-03-03 Thread Robin East
What about the following which can be run in spark shell: import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD val vertexlist = Array((1L,"One"), (2L,"Two"), (3L,"Three"), (4L,"Four"),(5L,"Five"),(6L,"Six")) val edgelist = Array(Edge(6,5,"6 to 5"),Edge(5,4,"

Re: GraphX path traversal

2015-03-03 Thread Madabhattula Rajesh Kumar
Hi Robin, Thank you for your response. Please find below my question. I have a below edge file Source Vertex Destination Vertex 1 2 2 3 3 4 4 5 5 6 6 6 In this graph 1st vertex is connected to 2nd vertex, 2nd Vertex is connected to 3rd vertex,. 6th vertex is connected to 6th vertex. S

Re: GraphX path traversal

2015-03-03 Thread Robin East
Rajesh I'm not sure if I can help you, however I don't even understand the question. Could you restate what you are trying to do. Sent from my iPhone > On 2 Mar 2015, at 11:17, Madabhattula Rajesh Kumar > wrote: > > Hi, > > I have a below edge list. How to find the parents path for every ve

Re: GraphX path traversal

2015-03-03 Thread Madabhattula Rajesh Kumar
Hi, Could you please let me know how to do this? (or) Any suggestion Regards, Rajesh On Mon, Mar 2, 2015 at 4:47 PM, Madabhattula Rajesh Kumar < mrajaf...@gmail.com> wrote: > Hi, > > I have a below edge list. How to find the parents path for every vertex? > > Example : > > Vertex 1 path : 2, 3,

GraphX path traversal

2015-03-02 Thread Madabhattula Rajesh Kumar
Hi, I have a below edge list. How to find the parents path for every vertex? Example : Vertex 1 path : 2, 3, 4, 5, 6 Vertex 2 path : 3, 4, 5, 6 Vertex 3 path : 4,5,6 vertex 4 path : 5,6 vertex 5 path : 6 Could you please let me know how to do this? (or) Any suggestion Source Vertex Destinati

Re: documentation - graphx-programming-guide error?

2015-03-02 Thread Sean Owen
duceTriplets example above appears to have the same problem. I think it's worth opening a PR + JIRA for the fix. On Mon, Mar 2, 2015 at 7:12 AM, Deborah Siegel wrote: > Hello, > > I am running through examples given on > http://spark.apache.org/docs/1.2.1/graphx-programming-guid

documentation - graphx-programming-guide error?

2015-03-01 Thread Deborah Siegel
Hello, I am running through examples given on http://spark.apache.org/docs/1.2.1/graphx-programming-guide.html The section for Map Reduce Triplets Transition Guide (Legacy) indicates that one can run the following .aggregateMessages code val graph: Graph[Int, Float] = ... def msgFun(triplet

Re: Learning GraphX Questions

2015-02-19 Thread Takeshi Yamamuro
rition 2? > > Thank You, > Matthew Bucci > > On Fri, Feb 13, 2015 at 10:58 PM, Ankur Dave wrote: > >> At 2015-02-13 12:19:46 -0800, Matthew Bucci wrote: >> > 1) How do you actually run programs in GraphX? At the moment I've been >> doing >> > everyt

Re: Learning GraphX Questions

2015-02-18 Thread Matthew Bucci
check which vertices belonged in partition 1 and parition 2? Thank You, Matthew Bucci On Fri, Feb 13, 2015 at 10:58 PM, Ankur Dave wrote: > At 2015-02-13 12:19:46 -0800, Matthew Bucci wrote: > > 1) How do you actually run programs in GraphX? At the moment I've been > doing &g

Re: [GraphX] Excessive value recalculations during aggregateMessages cycles

2015-02-15 Thread Takeshi Yamamuro
Is this a bug or a feature? > > Kyle > > > > On Sat, Feb 7, 2015 at 11:44 PM, Kyle Ellrott > wrote: > >> I'm trying to setup a simple iterative message/update problem in GraphX >> (spark 1.2.0), but I'm running into issues with the caching and >> re-

Re: failing GraphX application ('GC overhead limit exceeded', 'Lost executor', 'Connection refused', etc.)

2015-02-14 Thread Matthew Cornell
Oops! I forgot to excerpt the errors and warnings from that file: 15/02/12 08:02:03 ERROR TaskSchedulerImpl: Lost executor 4 on compute-0-3.wright: remote Akka client disassociated 15/02/12 08:03:00 WARN TaskSetManager: Lost task 1.0 in stage 28.0 (TID 37, compute-0-1.wright): java.lang.OutOfMe

Re: Learning GraphX Questions

2015-02-13 Thread Ankur Dave
At 2015-02-13 12:19:46 -0800, Matthew Bucci wrote: > 1) How do you actually run programs in GraphX? At the moment I've been doing > everything live through the shell, but I'd obviously like to be able to work > on it by writing and running scripts. You can create your own

Learning GraphX Questions

2015-02-13 Thread Matthew Bucci
Hello, I was looking at GraphX as I believe it can be useful in my research on temporal data and I had a number of questions about the system: 1) How do you actually run programs in GraphX? At the moment I've been doing everything live through the shell, but I'd obviously like to

failing GraphX application ('GC overhead limit exceeded', 'Lost executor', 'Connection refused', etc.)

2015-02-12 Thread Matthew Cornell
Hi Folks, I'm running a five-step path following-algorithm on a movie graph with 120K verticies and 400K edges. The graph has vertices for actors, directors, movies, users, and user ratings, and my Scala code is walking the path "rating > movie > rating > user > rating". There are 75K rating no

Re: [GraphX] Excessive value recalculations during aggregateMessages cycles

2015-02-08 Thread Kyle Ellrott
e iterative message/update problem in GraphX > (spark 1.2.0), but I'm running into issues with the caching and > re-calculation of data. I'm trying to follow the example found in the > Pregel implementation of materializing and cacheing messages and graphs and > then unpersisting th

[GraphX] Excessive value recalculations during aggregateMessages cycles

2015-02-07 Thread Kyle Ellrott
I'm trying to setup a simple iterative message/update problem in GraphX (spark 1.2.0), but I'm running into issues with the caching and re-calculation of data. I'm trying to follow the example found in the Pregel implementation of materializing and cacheing messages and

Reg GraphX APSP

2015-02-06 Thread Deep Pradhan
Hi, Is the implementation of All Pairs Shortest Path on GraphX for directed graphs or undirected graph? When I use the algorithm with dataset, it assumes that the graph is undirected. Has anyone come across that earlier? Thank you

Re: GraphX pregel: getting the current iteration number

2015-02-03 Thread Daniil Osipov
I don't think its possible to access. What I've done before is send the current or next iteration index with the message, where the message is a case class. HTH Dan On Tue, Feb 3, 2015 at 10:20 AM, Matthew Cornell wrote: > Hi Folks, > > I'm new to GraphX and Scala and m

GraphX pregel: getting the current iteration number

2015-02-03 Thread Matthew Cornell
Hi Folks, I'm new to GraphX and Scala and my sendMsg function needs to index into an input list to my algorithm based on the pregel()() iteration number, but I don't see a way to access that. I see in https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/sp

Re: GraphX: ShortestPaths does not terminate on a grid graph

2015-02-03 Thread Jay Hutfles
I think this is a separate issue with how the EdgeRDDImpl partitions edges. If you can merge this change in and rebuild, it should work: https://github.com/apache/spark/pull/4136/files If you can't, I just called the Graph.partitonBy() method right after construction my graph but before perfo

Re: GraphX: ShortestPaths does not terminate on a grid graph

2015-02-02 Thread NicolasC
On 01/29/2015 08:31 PM, Ankur Dave wrote: Thanks for the reminder. I just created a PR: https://github.com/apache/spark/pull/4273 Ankur Hello, Thanks for the patch. I applied it on Pregel.scala (in Spark 1.2.0 sources) and rebuilt Spark. During execution, at the 25th iteration of Pregel, che

Re: [Graphx & Spark] Error of "Lost executor" and TimeoutException

2015-02-02 Thread Yifan LI
;> Is your code hitting frequent garbage collection? >>> >>> Best Regards, >>> Sonal >>> Founder, Nube Technologies <http://www.nubetech.co/> >>> >>> <http://in.linkedin.com/in/sonalgoyal> >>> >>> >>>

Re: [Graphx & Spark] Error of "Lost executor" and TimeoutException

2015-02-02 Thread Yifan LI
p://www.nubetech.co/> >> >> <http://in.linkedin.com/in/sonalgoyal> >> >> >> >> On Fri, Jan 30, 2015 at 7:52 PM, Yifan LI > <mailto:iamyifa...@gmail.com>> wrote: >> >>> >>> >>> Hi, >>> >>> I am

Re: [Graphx & Spark] Error of "Lost executor" and TimeoutException

2015-02-02 Thread Sonal Goyal
nt garbage collection? > > Best Regards, > Sonal > Founder, Nube Technologies <http://www.nubetech.co/> > > <http://in.linkedin.com/in/sonalgoyal> > > > > On Fri, Jan 30, 2015 at 7:52 PM, Yifan LI wrote: > >> >> >> >> Hi, >>

Re: [Graphx & Spark] Error of "Lost executor" and TimeoutException

2015-01-30 Thread Yifan LI
betech.co/> > > <http://in.linkedin.com/in/sonalgoyal> > > > > On Fri, Jan 30, 2015 at 7:52 PM, Yifan LI <mailto:iamyifa...@gmail.com>> wrote: > >> >> >> Hi, >> >> I am running my graphx application on Spark 1.2.0(11 nodes

Re: [Graphx & Spark] Error of "Lost executor" and TimeoutException

2015-01-30 Thread Sonal Goyal
Is your code hitting frequent garbage collection? Best Regards, Sonal Founder, Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Fri, Jan 30, 2015 at 7:52 PM, Yifan LI wrote: > > > > Hi, > > I am running my graphx appli

[Graphx & Spark] Error of "Lost executor" and TimeoutException

2015-01-30 Thread Yifan LI
> > > Hi, > > I am running my graphx application on Spark 1.2.0(11 nodes cluster), has > requested 30GB memory per node and 100 cores for around 1GB input dataset(5 > million vertices graph). > > But the error below always happen… > > Is there anyone coul

[Graphx & Spark] Error of "Lost executor" and TimeoutException

2015-01-30 Thread Yifan LI
Hi, I am running my graphx application on Spark 1.2.0(11 nodes cluster), has requested 30GB memory per node and 100 cores for around 1GB input dataset(5 million vertices graph). But the error below always happen… Is there anyone could give me some points? (BTW, the overall edge/vertex RDDs

Re: GraphX: ShortestPaths does not terminate on a grid graph

2015-01-29 Thread Ankur Dave
Thanks for the reminder. I just created a PR: https://github.com/apache/spark/pull/4273 Ankur On Thu, Jan 29, 2015 at 7:25 AM, Jay Hutfles wrote: > Just curious, is this set to be merged at some point? - To unsubscribe, e-mail:

Re: GraphX: ShortestPaths does not terminate on a grid graph

2015-01-29 Thread Jay Hutfles
heckpoint the graph periodically, which writes it to > stable storage and interrupts the lineage chain before it grows too long. > > If you're able to recompile Spark, you can do this by applying the patch > to GraphX at the end of this mail, and before running graph algorithms, >

Re: [GraphX] Integration with TinkerPop3/Gremlin

2015-01-26 Thread Nicolas Colson
trouble tracking down the >>> link to the whole discussion. Also see [2] for code. >>> >>> [1] https://www.mail-archive.com/dev@spark.apache.org/msg06231.html >>> [2] https://github.com/kellrott/spark-gremlin >>> >>> On Wed, Jan 7, 2015 at 6:03 AM

Re: GraphX: ShortestPaths does not terminate on a grid graph

2015-01-22 Thread Ankur Dave
o consume increasing amounts of resources for scheduling and task serialization. The workaround is to checkpoint the graph periodically, which writes it to stable storage and interrupts the lineage chain before it grows too long. If you're able to recompile Spark, you can do this by applying

GraphX: ShortestPaths does not terminate on a grid graph

2015-01-22 Thread NicolasC
Hello, I try to execute a simple program that runs the ShortestPaths algorithm (org.apache.spark.graphx.lib.ShortestPaths) on a small grid graph. I use Spark 1.2.0 downloaded from spark.apache.org. The program's code is the following : object GraphXGridSP { def main(args : Array[String])

How does GraphX internally traverse the Graph?

2015-01-14 Thread mas
I want to know the internal traversal of Graph by GraphX. Is it vertex and edges based traversal or sequential traversal of RDDS? For example given a vertex of graph, i want to fetch only of its neighbors Not the neighbors of all the vertices ? How GraphX will traverse the graph in this case

RE: GraphX vs GraphLab

2015-01-13 Thread Buttler, David
would be if the AMP Lab or Databricks maintained a set of benchmarks on the web that showed how much each successive version of Spark improved. Dave From: Madabhattula Rajesh Kumar [mailto:mrajaf...@gmail.com] Sent: Monday, January 12, 2015 9:24 PM To: Buttler, David Subject: Re: GraphX vs

GraphX vs GraphLab

2015-01-12 Thread Madabhattula Rajesh Kumar
Hi Team, Is any one done comparison(pros and cons ) study between GraphX ad GraphLab. Could you please let me know any links for this comparison. Regards, Rajesh

[GraphX] Integration with TinkerPop3/Gremlin

2015-01-07 Thread Nicolas Colson
Hi Spark/GraphX community, I'm wondering if you have TinkerPop3/Gremlin on your radar? (github <https://github.com/tinkerpop/tinkerpop3>, doc <http://www.tinkerpop.com/docs/3.0.0-SNAPSHOT>) They've done an amazing work refactoring their stack recently and Gremlin is a very

Re: Using graphx to calculate average distance of a big graph

2015-01-06 Thread James
We are going to estimate the average distance using [HyperAnf]( http://arxiv.org/abs/1011.5599) on a 100 billion edge graph. 2015-01-07 2:18 GMT+08:00 Ankur Dave : > [-dev] > > What size of graph are you hoping to run this on? For small graphs where > materializing the all-pairs shortest path is

Re: Using graphx to calculate average distance of a big graph

2015-01-06 Thread Ankur Dave
[-dev] What size of graph are you hoping to run this on? For small graphs where materializing the all-pairs shortest path is an option, you could simply find the APSP using https://github.com/apache/spark/pull/3619 and then take the average distance (apsp.map(_._2.toDouble).mean). Ankur

Using graphx to calculate average distance of a big graph

2015-01-04 Thread James
Recently we want to use spark to calculate the average shortest path distance between each reachable pair of nodes in a very big graph. Is there any one ever try this? We hope to discuss about the problem.

Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-12-28 Thread Harihar Nahak
Yes, I had try that too. I took the pre-built spark 1.1 release. If you there are changes in up coming changes for GraphX library, just let me know or in spark 1.2 I can do try on that. --Harihar - --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3

Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-12-22 Thread pradhandeep
Did you try running PageRank.scala instead of LiveJournalPageRank.scala? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710p20808.html Sent from the Apache Spark User List mailing list

Re: Spark GraphX question.

2014-12-18 Thread Tae-Hyuk Ahn
transitive reduction algorithm (and get some hints from "TriangleCount.scale" in GraphX), it might have some steps as 1. Compute the set of neighbors for each vertex. 2. For each edge, compute the intersection of the sets and send the weight to both vertices. 3. For each vertex, mark an edge as

Re: Spark GraphX question.

2014-12-18 Thread Harihar Nahak
Thanks, > > Ted > > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-GraphX-question-tp20768.html > To start a new topic under Apache Spark

Spark GraphX question.

2014-12-18 Thread Tae-Hyuk Ahn
n" with considering the weight as a maximum spanning tree. Edges: 1 -> 2 (30) 2 -> 3 (30) Do you have a good idea for this? Thanks, Ted -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-GraphX-question-tp20768.html Sent from the Apache Spark

GraphX for large scale PageRank (~4 billion nodes, ~128 billion edges)

2014-12-12 Thread Stephen Merity
Hi! tldr; We're looking at potentially using Spark+GraphX to compute PageRank over a 4 billion node + 128 billion edge graph on a regular (monthly) basis, possibly growing larger in size over time. If anyone has hints / tips / upcoming optimizations I should test use (or wants to contr

GraphX for large scale PageRank (~4 billion nodes, ~128 billion edges)

2014-12-12 Thread Stephen Merity
Hi! tldr; We're looking at potentially using Spark+GraphX to compute PageRank over a 4 billion node + 128 billion edge graph on a regular (monthly) basis, possibly growing larger in size over time. If anyone has hints / tips / upcoming optimizations I should test out (or wants to contr

[Graphx] the communication cost of leftJoin

2014-12-12 Thread Yifan LI
partitioner to vA, what will graphx(spark) do to handle this case? for instance, as below 1) to check the partitioner of vB. 2) to do leftJoin operations, on each machine separately, for those co-located partitions of vA and vB. right? But, if vB’s partitioner is different, what will happen? how

Re: [Graphx] which way is better to access faraway neighbors?

2014-12-05 Thread Ankur Dave
At 2014-12-05 02:26:52 -0800, Yifan LI wrote: > I have a graph in where each vertex keep several messages to some faraway > neighbours(I mean, not to only immediate neighbours, at most k-hops far, e.g. > k = 5). > > now, I propose to distribute these messages to their corresponding > destinatio

[Graphx] which way is better to access faraway neighbors?

2014-12-05 Thread Yifan LI
Hi, I have a graph in where each vertex keep several messages to some faraway neighbours(I mean, not to only immediate neighbours, at most k-hops far, e.g. k = 5). now, I propose to distribute these messages to their corresponding destinations(say, "faraway neighbours”): - by using pregel api

Profiling GraphX codes.

2014-12-05 Thread Deep Pradhan
Is there any tool to profile GraphX codes in a cluster? Is there a way to know the messages exchanged among the nodes in a cluster? WebUI does not give all the information. Thank You

Re: GraphX Pregel halting condition

2014-12-04 Thread Ankur Dave
There's no built-in support for doing this, so the best option is to copy and modify Pregel to check the accumulator at the end of each iteration. This is robust and shouldn't be too hard, since the Pregel code is short and only uses public GraphX APIs. Ankur At 2014-12-03 09:37:01

GraphX Pregel halting condition

2014-12-03 Thread Jay Hutfles
I'm trying to implement a graph algorithm that does a form of path searching. Once a certain criteria is met on any path in the graph, I wanted to halt the rest of the iterations. But I can't see how to do that with the Pregel API, since any vertex isn't able to know the state of other arbitrary

GraphX Pregel halting condition

2014-12-03 Thread Jay Hutfles
I'm trying to implement a graph algorithm that does a form of path searching. Once a certain criteria is met on any path in the graph, I wanted to halt the rest of the iterations. But I can't see how to do that with the Pregel API, since any vertex isn't able to know the state of other arbitrary

Re: Edge List File in GraphX

2014-11-30 Thread Harihar Nahak
ode+s1001560n1972...@n3.nabble.com> wrote: > Hi, > Is it necessary for every vertex to have an attribute when we load a graph > to GraphX? > In other words, if I have an edge list file containing pairs of vertices > i.e., <1 2> means that there is an edge between node 1 and node 2. No

SVD Plus Plus in GraphX

2014-11-27 Thread Deep Pradhan
Hi, I was just going through the two codes in GraphX namely SVDPlusPlus and TriangleCount. In the first I see an RDD as an input to run ie, run(edges: RDD[Edge[Double]],...) and in the other I see run(VD:..., ED:...) Can anyone explain me the difference between these two? Infact SVDPlusPlus is the

Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-11-27 Thread Harihar Nahak
14-11-24 19:02:08 -0800, Harihar Nahak <[hidden email] > <http://user/SendEmail.jtp?type=node&node=19956&i=0>> wrote: > > > According to documentation GraphX runs 10x faster than normal Spark. So > I > > run Page Rank algorithm in both the applications: &

[graphx] failed to submit an application with java.lang.ClassNotFoundException

2014-11-27 Thread Yifan LI
Hi, I just tried to submit an application from graphx examples directory, but it failed: yifan2:bin yifanli$ MASTER=local[*] ./run-example graphx.PPR_hubs java.lang.ClassNotFoundException: org.apache.spark.examples.graphx.PPR_hubs at java.net.URLClassLoader$1.run(URLClassLoader.java:202

Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-11-26 Thread Ankur Dave
At 2014-11-24 19:02:08 -0800, Harihar Nahak wrote: > According to documentation GraphX runs 10x faster than normal Spark. So I > run Page Rank algorithm in both the applications: > [...] > Local Mode (Machine : 8 Core; 16 GB memory; 2.80 Ghz Intel i7; Executor > Memory: 4Gb, No. o

Re: Undirected Graphs in GraphX-Pregel

2014-11-26 Thread Ankur Dave
At 2014-11-26 21:21:17 -0800, Deep Pradhan wrote: > Is it the same in the Pregel abstraction of GraphX too? Do we always have > to input directed graphs to Pregel abstraction or can we also give > undirected graphs? Yes, all graphs in GraphX are directed, including in the Pregel A

Undirected Graphs in GraphX-Pregel

2014-11-26 Thread Deep Pradhan
Hi, I was going through this paper on Pregel titled, "Pregel: A System for Large-Scale Graph Processing". In the second section named Model Of Computation, it says that the input to a Pregel computation is a directed graph. Is it the same in the Pregel abstraction of GraphX too? Do

Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-11-26 Thread Harihar Nahak
Hi Guys, is there any one experience the same thing as above? - --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710p19909.html Sent from the Apache Spark User List

Re: how to force graphx to execute transfomtation

2014-11-26 Thread Ankur Dave
At 2014-11-26 05:25:10 -0800, Hlib Mykhailenko wrote: > I work with Graphx. When I call "graph.partitionBy(..)" nothing happens, > because, as I understood, that all transformation are lazy and partitionBy is > built using transformations. > Is there way how to force spa

Re: how to force graphx to execute transfomtation

2014-11-26 Thread Jörg Schad
Hi, can't you just use graph.partitionBy(..).collect()? Cheers, Joerg On Wed, Nov 26, 2014 at 2:25 PM, Hlib Mykhailenko wrote: > Hello, > > I work with Graphx. When I call "graph.partitionBy(..)" nothing happens, > because, as I understood, that all transformation ar

how to force graphx to execute transfomtation

2014-11-26 Thread Hlib Mykhailenko
Hello, I work with Graphx. When I call "graph.partitionBy(..)" nothing happens, because, as I understood, that all transformation are lazy and partitionBy is built using transformations. Is there way how to force spark to actually execute this transformation and not use

Re: New Codes in GraphX

2014-11-24 Thread Deep Pradhan
Could it be because my edge list file is in the form (1 2), where there is an edge between node 1 and node 2? On Tue, Nov 18, 2014 at 4:13 PM, Ankur Dave wrote: > At 2014-11-18 15:51:52 +0530, Deep Pradhan > wrote: > > Yes the above command works, but there is this problem. Most of the > ti

Edge List File in GraphX

2014-11-24 Thread Deep Pradhan
Hi, Is it necessary for every vertex to have an attribute when we load a graph to GraphX? In other words, if I have an edge list file containing pairs of vertices i.e., <1 2> means that there is an edge between node 1 and node 2. Now, when I run PageRank on this data it return a NaN. Can

Is Spark? or GraphX runs fast? a performance comparison on Page Rank

2014-11-24 Thread Harihar Nahak
Hi All, I started exploring Spark from past 2 months. I'm looking for some concrete features from both Spark and GraphX so that I'll take some decisions what to use, based upon who get highest performance. According to documentation GraphX runs 10x faster than normal Spark. So I run

[GraphX] Mining GeoData (OSM)

2014-11-20 Thread andy petrella
Guys, After talking with Ankur, it turned out that sharing the talk we gave at ScalaIO (France) would be worthy. So there you go, and don't hesitate to share your thoughts ;-)/ http://www.slideshare.net/noootsab/machine-learning-and-graphx Greetz, andy

GraphX bug re-opened

2014-11-19 Thread Gary Malouf
We keep running into https://issues.apache.org/jira/browse/SPARK-2823 when trying to use GraphX. The cost of repartitioning the data is really high for us (lots of network traffic) which is killing the job performance. I understand the bug was reverted to stabilize unit tests, but frankly it

GraphX twitter

2014-11-18 Thread tom85
out this error: ERROR BlockFetcherIterator$BasicBlockFetcherIterator: Could not get block(s) from ConnectionManagerId($SPARK_MASTER,59331) java.io.IOException: sendMessageReliably failed without being ACK'd Any help would be highly appreciated. -- View this message in context: ht

Re: New Codes in GraphX

2014-11-18 Thread Ankur Dave
At 2014-11-18 15:51:52 +0530, Deep Pradhan wrote: > Yes the above command works, but there is this problem. Most of the times, > the total rank is Nan (Not a Number). Why is it so? I've also seen this, but I'm not sure why it happens. If you could find out which vertices are getting the NaN rank

Re: Landmarks in GraphX section of Spark API

2014-11-18 Thread Ankur Dave
At 2014-11-18 15:44:31 +0530, Deep Pradhan wrote: > I meant to ask whether it gives the solution faster than other algorithms. No, it's just that it's much simpler and easier to implement than the others. Section 5.2 of the Pregel paper [1] justifies using it for a graph (a binary tree) with 1

Re: New Codes in GraphX

2014-11-18 Thread Deep Pradhan
gt;> At 2014-11-18 14:51:54 +0530, Deep Pradhan >> wrote: >> > I am using Spark-1.0.0. There are two GraphX directories that I can see >> here >> > >> > 1. spark-1.0.0/examples/src/main/scala/org/apache/sprak/examples/graphx >> > which contains

Re: New Codes in GraphX

2014-11-18 Thread Ankur Dave
At 2014-11-18 15:35:13 +0530, Deep Pradhan wrote: > Now, how do I run the LiveJournalPageRank.scala that is there in 1? I think it should work to use MASTER=local[*] $SPARK_HOME/bin/run-example graphx.LiveJournalPageRank /edge-list-file.txt --numEPart=8 --numIter=10 --partStrategy=EdgePart

<    1   2   3   4   5   6   7   >