Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19487 @HyukjinKwon That was exactly the initial solution I tested locally when we saw the problem with Phoenix. The reason to expand it was two fold: a) This change preserves existing behavior from 2.1 b) There are three cases where `new Path` can throw exception, and I would be special case'ing only two of them (null and empty string). My understanding is that hadoop does not require to create a Path out of `mapred.output.dir`/`mapreduce.output.fileoutputformat.outputdir` - it is upto the individual OutputFormat/Committer to handle it appropriately (unless I am missing something here - @steveloughran would be able to opine better on this). Expectation of `mapred.output.dir`/`mapreduce.output.fileoutputformat.outputdir` being a valid path is a behavior change we introduced in 2.2 in `HadoopMapReduceCommitProtocol` : since we are now directly handling promotion of files to final location for some cases in `commitJob` or cleanup in `abortJob` : this is done in addition to what is done by committer (which is invoked before our code). For committers which are not `Path` based, this promotion of output by spark does not apply (there is no path to promote !) - which is what @szhem's patch was fixing - except it was handling only null. Having said all this - if it is the expectation in hadoop that `mapred.output.dir`/`mapreduce.output.fileoutputformat.outputdir` , when specified, should be a valid `Path` - I will hapilly change it to special case'ing only for `null` and `""`. Unless we get that clarity, IMO we should preserve behavior and be defensive about when we try to do manual promotion. Please note that if invalid value is being provided for `mapred.output.dir`/`mapreduce.output.fileoutputformat.outputdir` - the corresponding output format or committer will throw approrpriate exception (like in case of MR or pig or hive).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org