Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
BTW, I think a cleaner fix is to make shuffle files reliable(e.g. put them
on HDFS), so that Spark will never retry a task from a finished shuffle map
stage. Then all the problems go away, the randomness is materialized with
shuffle files and we will not hit correctness issues. This is a big project and
maybe we can consider it in Spark 3.0.
For now(2.4) I think failing and asking users to checkpoint is better than
just documenting that `repartition`/`zip` may return wrong results. We also
have a plan to reduce the possibility of failing later, by marking RDD actions
as "repeatable".
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]