On Jul 30, 2014, at 4:39 PM, Ankur Dave <ankurd...@gmail.com> wrote:
> Jeffrey Picard <jp3...@columbia.edu> writes: >> I tried unpersisting the edges and vertices of the graph by hand, then >> persisting the graph with persist(StorageLevel.MEMORY_AND_DISK). I still see >> the same behavior in connected components however, and the same thing you >> described in the storage page. > > Unfortunately it's not possible to change the graph's storage level by hand > without modifying GraphX itself, because internally GraphX will create new > RDDs, persist them using MEMORY_ONLY, and immediately materialize them, all > before you get a chance to change the storage level. You can see this > happening in the storage page: one graph (a VertexRDD and an EdgeRDD) has the > desired storage level, but new ones are still set to MEMORY_ONLY. > >> It seems that the version of graphx I’m using doesn't have the option for >> setting the storage level in the GraphLoader.edgeListFile method. >> https://spark.apache.org/docs/1.0.1/api/scala/index.html#org.apache.spark.graphx.GraphLoader$ >> [...] >> Would that (newer?) version of GraphX with the storage level settable in the >> edgeListFile possibly solve this, or could there still be something else >> going >> on? > > Yes, it looks like custom storage levels would solve the problem. That was > added in apache/spark#946 [1], which will be released as part of Spark 1.1.0. > Until then, is it possible for you to rebuild Spark from the master branch? > > Ankur > > [1] https://github.com/apache/spark/pull/946 That worked! The entire thing ran in about an hour and a half, thanks! Is there by chance an easy way to build spark apps using the master branch build of spark? I’ve been having to use the spark-shell.