Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19294#discussion_r140188088
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -130,17 +135,21 @@ class HadoopMapReduceCommitProtocol(jobId: String,
path: String)
val filesToMove = taskCommits.map(_.obj.asInstanceOf[Map[String,
String]])
.foldLeft(Map[String, String]())(_ ++ _)
logDebug(s"Committing files staged for absolute locations
$filesToMove")
- val fs = absPathStagingDir.getFileSystem(jobContext.getConfiguration)
- for ((src, dst) <- filesToMove) {
- fs.rename(new Path(src), new Path(dst))
+ if (hasAbsPathFiles) {
+ val fs = absPathStagingDir.getFileSystem(jobContext.getConfiguration)
+ for ((src, dst) <- filesToMove) {
+ fs.rename(new Path(src), new Path(dst))
+ }
+ fs.delete(absPathStagingDir, true)
}
- fs.delete(absPathStagingDir, true)
--- End diff --
Given the changes being made here, it seems a good place to add the
suggestion of [SPARK-20045](https://issues.apache.org/jira/browse/SPARK-20045)
& make that abort() call resilient to failures, by doing that delete even if
the hadoop committer raised an IOE
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]