Hello all, I am trying to understanding how graphx works internally.
I created a small program in graphx : 1. I create a new graph val graph: Graph[(String, Double), Int] = Graph(vertexRDD, edgeRDD) 2. Now I want to see how my vertices were created, hence I use scala> graph.vertices.toDebugString res11: String = (48) VertexRDDImpl[11] at RDD at VertexRDD.scala:57 [] | VertexRDD, VertexRDD ZippedPartitionsRDD2[9] at zipPartitions at VertexRDD.scala:322 [] | CachedPartitions: 48; MemorySize: 328.0 KB; ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B | ShuffledRDD[5] at partitionBy at VertexRDD.scala:319 [] +-(48) ParallelCollectionRDD[0] at parallelize at <console>:45 [] | MapPartitionsRDD[8] at mapPartitions at VertexRDD.scala:361 [] | ShuffledRDD[7] at partitionBy at VertexRDD.scala:361 [] +-(48) VertexRDD.createRoutingTables - vid2pid (aggregation) MapPartitionsRDD[6] at mapPartitions at VertexRDD.scala:356 [] | EdgeRDD, EdgeRDD MapPartitionsRDD[2] at mapPartitionsWithIndex at EdgeRDD.scala:105 [] | ParallelCollectionRDD[1] at parallelize at <cons... scala> But this doesn't give me the whole picture as you can see it is clipped (10 lines I guess is the default), (a) is there an option to increase this number so that I can see the whole output. (b) i know that indentations indicate a shuffle boundary & the parentheses indicate parallelism at each step of this physical plan so does this mean the above can be put into a picture like : RDD A (VertexRDD.cre..) [48 partitions] \ --- RDD C (VertexRDD, VertexRDD Zipped...)[48 partitions] / RDD B (ParallelCollecti..) [48 partitions] I am fairly new to spark, so please feel free to correct! Thanks Anirudh