Github user megaserg commented on the issue:
    Thank you @dongjoon-hyun! This was also affecting our Spark job performance!
    We're using `mapreduce.fileoutputcommitter.algorithm.version=2` in our 
Spark job config, as recommended e.g. here: We're using 
user-provided Hadoop 2.9.0.
    However, since this 2.6.5 JAR was in spark/jars, it was given priority in 
the classpath over Hadoop-distributed 2.9.0 JAR. The 2.6.5 was silently 
ignoring the `mapreduce.fileoutputcommitter.algorithm.version` setting and used 
the default, slow algorithm (I believe hadoop-mapreduce-client-core only had 
one, slow, algorithm until 2.7.0).
    I believe this affects everyone who uses any mapreduce settings with Spark 
2.3.0. Great job!
    Can we double-check that this JAR is not present in the "without-hadoop" 
Spark distribution anymore?


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to