Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22112
  
    BTW, I think a cleaner fix is to make shuffle files reliable(e.g. put them 
on HDFS), so that Spark will never retry a task from a finished shuffle map 
stage. Then all the problems go away, the randomness is materialized with 
shuffle files and we will not hit correctness issues. This is a big project and 
maybe we can consider it in Spark 3.0.
    
    For now(2.4) I think failing and asking users to checkpoint is better than 
just documenting that `repartition`/`zip` may return wrong results. We also 
have a plan to reduce the possibility of failing later, by marking RDD actions 
as "repeatable".


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to