Github user squito commented on the issue: https://github.com/apache/spark/pull/21096 its not a bad idea, but as @markhamstra mentions we can't have an `rddToImmediateShuffleDependency` data structure which keeps growing. You could keep it local to one job submission, which would also slightly diminish its utility. Did you observe this as a bottleneck from some profiling? Otherwise I'm inclined to say its not worth the complexity right now. I'd normally expect to only have to walk through a very small number of RDDs and so it'll be quick.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org