Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/21698
@jiangxb1987 Any closure sensitive to iteration order [1] is effected by
this - under the set of circumstances.
If we cannot solve it in a principled manner (make shuffle repeatable which
I believe you have investigated and found to be difficult ?) - next best thing
until we have a performant solution, would be to expose it to user's and have
them deal with it (which is what I did, for example) - with hints on how to
accomplish it.
The proposed solution will cause cascading failures for non trivial
applications (chain of shuffles) - and also introduce high cost - and can
unfortunately cause application failures and unpredictable SLA's.
[1] I gave example of zip* and sampling, but really - any user defined
closure is affected; and we cannot special case for all of them.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]