Re: Adjacency List representation in Spark

2014-09-18 Thread Koert Kuipers
we build our own adjacency lists as well. the main motivation for us was that graphx has some assumptions about everything fitting in memory (it has .cache statements all over place). however if my understanding is wrong and graphx can handle graphs that do not fit in memory i would be interested t

Re: Adjacency List representation in Spark

2014-09-18 Thread Harsha HN
Hi Andrew, The only reason that I avoided GraphX approach is that I didnt see any explanation on Java side nor API documentation on Java. Do you have any code piece of using GraphX API in JAVA? Thanks, Harsha On Wed, Sep 17, 2014 at 10:44 PM, Andrew Ash wrote: > Hi Harsha, > > You could look t

Re: Adjacency List representation in Spark

2014-09-17 Thread Andrew Ash
Hi Harsha, You could look through the GraphX source to see the approach taken there for ideas in your own. I'd recommend starting at https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Graph.scala#L385 to see the storage technique. Why do you want to avoid u

Adjacency List representation in Spark

2014-09-17 Thread Sree Harsha
in nature? Basically we are trying to fit HashMap(Adjacency List) into Spark RDD. Is there any other way other than GraphX? Thanks and Regards, Harsha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Adjacency-List-representation-in-Spark-tp1.html Sent

Adjacency List representation in Spark

2014-09-17 Thread Harsha HN
Hello We are building an adjacency list to represent a graph. Vertexes, Edges and Weights for the same has been extracted from hdfs files by a Spark job. Further we expect size of the adjacency list(Hash Map) could grow over 20Gigs. How can we represent this in RDD, so that it will distributed in