Github user mridulm commented on the issue: https://github.com/apache/spark/pull/21698 @jiangxb1987 Any closure sensitive to iteration order [1] is effected by this - under the set of circumstances. If we cannot solve it in a principled manner (make shuffle repeatable which I believe you have investigated and found to be difficult ?) - next best thing until we have a performant solution, would be to expose it to user's and have them deal with it (which is what I did, for example) - with hints on how to accomplish it. The proposed solution will cause cascading failures for non trivial applications (chain of shuffles) - and also introduce high cost - and can unfortunately cause application failures and unpredictable SLA's. [1] I gave example of zip* and sampling, but really - any user defined closure is affected; and we cannot special case for all of them.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org