Hello all,

I am trying to understanding how graphx works internally.

I created a small program in graphx :
1. I create a new graph
val graph: Graph[(String, Double), Int] = Graph(vertexRDD, edgeRDD)
2. Now I want to see how my vertices were created, hence I use
scala> graph.vertices.toDebugString
res11: String =
(48) VertexRDDImpl[11] at RDD at VertexRDD.scala:57 []
 |   VertexRDD, VertexRDD ZippedPartitionsRDD2[9] at zipPartitions at
VertexRDD.scala:322 []
 |       CachedPartitions: 48; MemorySize: 328.0 KB;
ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B
 |   ShuffledRDD[5] at partitionBy at VertexRDD.scala:319 []
 +-(48) ParallelCollectionRDD[0] at parallelize at <console>:45 []
 |   MapPartitionsRDD[8] at mapPartitions at VertexRDD.scala:361 []
 |   ShuffledRDD[7] at partitionBy at VertexRDD.scala:361 []
 +-(48) VertexRDD.createRoutingTables - vid2pid (aggregation)
MapPartitionsRDD[6] at mapPartitions at VertexRDD.scala:356 []
    |   EdgeRDD, EdgeRDD MapPartitionsRDD[2] at mapPartitionsWithIndex at
EdgeRDD.scala:105 []
    |   ParallelCollectionRDD[1] at parallelize at <cons...
scala>
But this doesn't give me the whole picture as you can see it is clipped (10
lines I guess is the default),
(a) is there an option to increase this number so that I can see the whole
output.
(b) i know that indentations indicate a shuffle boundary & the parentheses
indicate parallelism at each step of this physical plan so does this mean
the above can be put into a picture like :
RDD A (VertexRDD.cre..) [48 partitions]
                                          \
                                             --- RDD C (VertexRDD,
VertexRDD Zipped...)[48 partitions]
                                          /
RDD B (ParallelCollecti..) [48 partitions]

I am fairly new to spark, so please feel free to correct!

Thanks
Anirudh

Reply via email to