Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19497
I guess one aspect of `saveAsNewAPIHadoopFile` is that it calls `
jobConfiguration.set("mapreduce.output.fileoutputformat.outputdir", path)`, and
`Configuration.set(String key, String value)` has a check for null key or value.
If handling of paths is to be done in the committer,
`saveAsNewAPIHadoopFile` should really be looking @ path and calling
jobConfiguration.unset("mapreduce.output.fileoutputformat.outputdir) if
path==null.
Looking at how Hadoop's FileOutputFormat implementations work, they can
handle a null/undefined output dir property, *but not an empty one*.
```java
public static Path getOutputPath(JobContext job) {
String name = job.getConfiguration().get(FileOutputFormat.OUTDIR);
return name == null ? null: new Path(name);
```
Which implies that `saveAsNewHadoopFile("")` might want to unset the config
option too, so offloading the problem of what happens on an empty path to the
committer. Though I'd recommend checking to see what meaningful exceptions
actually get raised in this situation when the committer is the normal
FileOutputFormat/FileOutputCommitter setup
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]