Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1039#issuecomment-45836980
  
    Hey @GregOwen sorry to keep changing the spec on you, but I thought a bit 
more and I think for more complicated graphs we might want to change things a 
bit. I looked at how some other similar tools work and I was thinking something 
like this:
    
    ```
    scala> var a = rdd.map(x => (x, x)).groupByKey(5).map{ case (x, y) => 
y.sum}.map(x => (1, x))
    scala> var b = sc.makeRDD(1 to 1000, 100).map(x => (x, x))
    scala> a.join(b, 3).toDebugString
    (3) FlatMappedValuesRDD[49] at join at <console>:19
    |   MappedValuesRDD[48] at join at <console>:19       
    |   CoGroupedRDD[47] at join at <console>:19          
    +-(5) MappedRDD[38] at map at <console>:14                  
    | |   MappedRDD[37] at map at <console>:14                 
    | |   MappedValuesRDD[36] at groupByKey at <console>:14    
    | |   MapPartitionsRDD[35] at groupByKey at <console>:14   
    | |   ShuffledRDD[34] at groupByKey at <console>:14        
    | +-(10) MappedRDD[33] at map at <console>:14                   
    |    |   ParallelCollectionRDD[0] at makeRDD at <console>:12
    |
    +-(100) MappedRDD[40] at map at <console>:12
             ParallelCollectionRDD[39] at makeRDD at <console>:12   
    ```
    
    For comparison this is how it is now:
    
    ```
    res15: String = 
    (3) + FlatMappedValuesRDD[63] at join at <console>:19
        | MappedValuesRDD[62] at join at <console>:19
        | CoGroupedRDD[61] at join at <console>:19
       (5) + MappedRDD[55] at map at <console>:14
           | MappedRDD[54] at map at <console>:14
           | MappedValuesRDD[53] at groupByKey at <console>:14
           | MapPartitionsRDD[52] at groupByKey at <console>:14
           | ShuffledRDD[51] at groupByKey at <console>:14
          (10) + MappedRDD[50] at map at <console>:14
               | ParallelCollectionRDD[0] at makeRDD at <console>:12
       (100) + MappedRDD[57] at map at <console>:12
             | ParallelCollectionRDD[56] at makeRDD at <console>:12
    ```
    
    The benefit of the newer structure is that if you have two siblings that 
are somewhat far apart (because one or both have nested dependencies) then you 
can visually track it on the left by just following the dots. I stole this 
format mostly from what happens when you run `sbt/sbt dependency-tree`.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to