[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

cloud-fan Mon, 20 Aug 2018 19:52:32 -0700

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22112
  
    So there are 2 options:
    
    1. ask the RDD closure to be idempotent. I'm not sure if it's OK for MLlib, 
cc @mengxr @WeichenXu123 @yanboliang 
    
    2. ask the output committer to be able to overwrite a committed task. Note 
that, the output committer here is the `FileCommitProtocol` interface in Spark, 
not the hadoop output committer. We don't have to make it all the hadoop output 
committers work.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to