[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

steveloughran Tue, 17 Oct 2017 04:24:04 -0700

Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/19497
  
    I guess one aspect of `saveAsNewAPIHadoopFile` is that it calls ` 
jobConfiguration.set("mapreduce.output.fileoutputformat.outputdir", path)`, and 
`Configuration.set(String key, String value)` has a check for null key or value.
    
    If handling of paths is to be done in the committer, 
`saveAsNewAPIHadoopFile` should really be looking @ path and calling 
jobConfiguration.unset("mapreduce.output.fileoutputformat.outputdir) if 
path==null. 
    
    Looking at how Hadoop's FileOutputFormat implementations work, they can 
handle a null/undefined output dir property, *but not an empty one*.
    
    ```java
    public static Path getOutputPath(JobContext job) {
       String name = job.getConfiguration().get(FileOutputFormat.OUTDIR);
        return name == null ? null: new Path(name);
    ```  
    
    Which implies that `saveAsNewHadoopFile("")` might want to unset the config 
option too, so offloading the problem of what happens on an empty path to the 
committer. Though I'd recommend checking to see what meaningful exceptions 
actually get raised in this situation when the committer is the normal 
FileOutputFormat/FileOutputCommitter setup



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

Reply via email to