Github user zheh12 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21257#discussion_r189422931
  
    --- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
    @@ -163,6 +170,15 @@ class HadoopMapReduceCommitProtocol(
       }
     
       override def commitJob(jobContext: JobContext, taskCommits: 
Seq[TaskCommitMessage]): Unit = {
    +    // first delete the should delete special file
    +    for (fs <- pathsToDelete.keys) {
    +      for (path <- pathsToDelete(fs)) {
    +        if (fs.exists(path)) {
    +          fs.delete(path, true)
    --- End diff --
    
    I think we should not delete the data when the task is aborted. The 
semantics of
    `descriptionWithJob` should be to delete the data when the `Job` is 
commited.
    I change code for handling exceptions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to