[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

tgravescs Tue, 21 Aug 2018 06:55:27 -0700

Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/22112
  
    > 2. ask the output committer to be able to overwrite a committed task. 
Note that, the output committer here is the FileCommitProtocol interface in 
Spark, not the hadoop output committer. We don't have to make all the hadoop 
output committers work.
    
    I disagree with this.   Spark works with any hadoop output committer via 
RDD api.  Spark writing to HBASE is a perfect example of this. You can't do 
moves in hbase.   PairRDDfunctions.saveAsHadoopDataset can be used with hbase, 
this uses the SparkHadoopWriter.write function that uses the FileCommitProtocol 
in Spark.   If that is assuming moves are possible for all output committers 
then in my opinon its a bug.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to