[GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...

zheh12 Fri, 18 May 2018 19:52:24 -0700

Github user zheh12 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21257#discussion_r189422931
  
    --- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
    @@ -163,6 +170,15 @@ class HadoopMapReduceCommitProtocol(
       }
     
       override def commitJob(jobContext: JobContext, taskCommits: 
Seq[TaskCommitMessage]): Unit = {
    +    // first delete the should delete special file
    +    for (fs <- pathsToDelete.keys) {
    +      for (path <- pathsToDelete(fs)) {
    +        if (fs.exists(path)) {
    +          fs.delete(path, true)
    --- End diff --
    
    I think we should not delete the data when the task is aborted. The 
semantics of
    `descriptionWithJob` should be to delete the data when the `Job` is 
commited.
    I change code for handling exceptions.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21257: [SPARK-24194] [SQL]HadoopFsRelation cannot overwr...

Reply via email to