[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

cloud-fan Wed, 08 Aug 2018 12:04:26 -0700

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    More thoughts: what if the last step of the job is writing data out? Do we 
need to improve the `OutputCoordinator` to support canceling all the writing 
tasks? Shall we simplify the logic and just retry the entire job?
    
    Or shall we go to the other direction and think about how to make RDD 
output deterministic? by sort or checkpoint.
    
    Or shall we just deprecate these problematic operations like `repartition` 
and `zip`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to