[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

mridulm Fri, 13 Jul 2018 01:07:05 -0700

Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    @jiangxb1987 data loss comes because a re-execution of zip might generate a 
key for which corresponding reducer has already finished.
    Hence re-execution of stage will not result in subsequent child stage's 
reducer partition getting re-executed : resulting in data loss.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to