[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

steveloughran Thu, 21 Sep 2017 03:48:35 -0700

Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19294#discussion_r140188088
  
    --- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
    @@ -130,17 +135,21 @@ class HadoopMapReduceCommitProtocol(jobId: String, 
path: String)
         val filesToMove = taskCommits.map(_.obj.asInstanceOf[Map[String, 
String]])
           .foldLeft(Map[String, String]())(_ ++ _)
         logDebug(s"Committing files staged for absolute locations 
$filesToMove")
    -    val fs = absPathStagingDir.getFileSystem(jobContext.getConfiguration)
    -    for ((src, dst) <- filesToMove) {
    -      fs.rename(new Path(src), new Path(dst))
    +    if (hasAbsPathFiles) {
    +      val fs = absPathStagingDir.getFileSystem(jobContext.getConfiguration)
    +      for ((src, dst) <- filesToMove) {
    +        fs.rename(new Path(src), new Path(dst))
    +      }
    +      fs.delete(absPathStagingDir, true)
         }
    -    fs.delete(absPathStagingDir, true)
    --- End diff --
    
    Given the changes being made here, it seems a good place to add the 
suggestion of [SPARK-20045](https://issues.apache.org/jira/browse/SPARK-20045) 
& make that abort() call resilient to failures, by doing that delete even if 
the hadoop committer raised an IOE



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

Reply via email to