[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29895: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

GitBox Tue, 29 Sep 2020 09:37:24 -0700


dongjoon-hyun edited a comment on pull request #29895:
URL: https://github.com/apache/spark/pull/29895#issuecomment-700821012



   Hi, @steveloughran and @tgravescs . 
   
   No matter what happens in the future, they cannot change the history (Apache 
Hadoop 3.2.0 and all exiting Hadoop 3.x versions). And, for now, Apache Spark 
3.1 will be stuck in Apache Hadoop 3.2.0 due to the Guava issue. That's the 
reason why we need to do this right now from Spark side.
   
   For the following, @steveloughran , as I wrote in the PR description, this 
PR doesn't override the explicit user-give config. This is only setting `v1` 
when there is no explicit setting.
   > V2 is used in places where people have hit the scale limits with v1, and 
they are happy with the risk of failures. 
   
   Eventually, I believe we can use `hadoop-client-runtime` only in order to 
remove guava dependency (#29843) and consume @steveloughran 's new Hadoop 
release in the future. Until that time, Apache Spark 3.1 had better provide a 
no-known-correctness-regression migration. If Apache Spark 3.1 default 
distribution is unsafe due to the 3rd party (in this case Hadoop), how can we 
recommend this to the users?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29895: [SPARK-33019][CORE] Use spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 by default

Reply via email to