Hi all,

I am using Graphx in spark-0.9.0-incubating. The number of vertices can be 100 
million and the number of edges can be 1 billion in our graph. As a result, I 
must carefully use my limit memory. So I have some questions to the Graphx 
module.

Why do some transformations like partitionBy, mapVertices cache the new graph 
and some like outerJoinVertices not?

I use Pregel api and just use edgeTriplet.srcAttr in sendMsg, after that I get 
a new Graph and I use graph.mapReduceTriplets and useedgeTriplet.srcAttr and 
edgeTriplet.dstAttr in sendMsg. I found that with the implement of 
ReplicatedVertexView, spark will complute all the graph which should has been 
computer before. Can anyone explain the implement here?

Why dose not VertexPartition extends Serializable? It's used by RDD.

Can you provide an "spark.default.cache.useDisk" for using DISK_ONLY in cache 
by default?



- Wu Zeming




Reply via email to