Github user zheh12 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21257#discussion_r189422931
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -163,6 +170,15 @@ class HadoopMapReduceCommitProtocol(
}
override def commitJob(jobContext: JobContext, taskCommits:
Seq[TaskCommitMessage]): Unit = {
+ // first delete the should delete special file
+ for (fs <- pathsToDelete.keys) {
+ for (path <- pathsToDelete(fs)) {
+ if (fs.exists(path)) {
+ fs.delete(path, true)
--- End diff --
I think we should not delete the data when the task is aborted. The
semantics of
`descriptionWithJob` should be to delete the data when the `Job` is
commited.
I change code for handling exceptions.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]