[GitHub] spark issue #21096: [SPARK-24011][CORE][WIP] cache rdd's immediate parent Sh...

squito Mon, 23 Apr 2018 13:56:21 -0700

Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21096
  
    its not a bad idea, but as @markhamstra mentions we can't have an 
`rddToImmediateShuffleDependency` data structure which keeps growing.  You 
could keep it local to one job submission, which would also slightly diminish 
its utility.
    
    Did you observe this as a bottleneck from some profiling?   Otherwise I'm 
inclined to say its not worth the complexity right now.  I'd normally expect to 
only have to walk through a very small number of RDDs and so it'll be quick.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21096: [SPARK-24011][CORE][WIP] cache rdd's immediate parent Sh...

Reply via email to