Re: GraphX Connected Components

Jeffrey Picard Wed, 30 Jul 2014 20:34:23 -0700

On Jul 30, 2014, at 4:39 PM, Ankur Dave <ankurd...@gmail.com> wrote:


> Jeffrey Picard <jp3...@columbia.edu> writes:
>> I tried unpersisting the edges and vertices of the graph by hand, then
>> persisting the graph with persist(StorageLevel.MEMORY_AND_DISK). I still see
>> the same behavior in connected components however, and the same thing you
>> described in the storage page.
> 
> Unfortunately it's not possible to change the graph's storage level by hand 
> without modifying GraphX itself, because internally GraphX will create new 
> RDDs, persist them using MEMORY_ONLY, and immediately materialize them, all 
> before you get a chance to change the storage level. You can see this 
> happening in the storage page: one graph (a VertexRDD and an EdgeRDD) has the 
> desired storage level, but new ones are still set to MEMORY_ONLY.
> 
>> It seems that the version of graphx I’m using doesn't have the option for
>> setting the storage level in the GraphLoader.edgeListFile method.
>> https://spark.apache.org/docs/1.0.1/api/scala/index.html#org.apache.spark.graphx.GraphLoader$
>> [...]
>> Would that (newer?) version of GraphX with the storage level settable in the
>> edgeListFile possibly solve this, or could there still be something else 
>> going
>> on?
> 
> Yes, it looks like custom storage levels would solve the problem. That was 
> added in apache/spark#946 [1], which will be released as part of Spark 1.1.0. 
> Until then, is it possible for you to rebuild Spark from the master branch?
> 
> Ankur
> 
> [1] https://github.com/apache/spark/pull/946

That worked! The entire thing ran in about an hour and a half, thanks!

Is there by chance an easy way to build spark apps using the master branch 
build of spark? I’ve been having to use the spark-shell.

Re: GraphX Connected Components

Reply via email to