Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4155#discussion_r23478509
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala ---
    @@ -106,18 +107,25 @@ class SparkHadoopWriter(@transient jobConf: JobConf)
         val taCtxt = getTaskContext()
         val cmtr = getOutputCommitter()
         if (cmtr.needsTaskCommit(taCtxt)) {
    -      try {
    -        cmtr.commitTask(taCtxt)
    -        logInfo (taID + ": Committed")
    -      } catch {
    -        case e: IOException => {
    -          logError("Error committing the output of task: " + taID.value, e)
    -          cmtr.abortTask(taCtxt)
    -          throw e
    +      val outputCommitCoordinator = SparkEnv.get.outputCommitCoordinator
    +      val conf = SparkEnv.get.conf
    +      val canCommit: Boolean = outputCommitCoordinator.canCommit(jobID, 
splitID, attemptID)
    +      if (canCommit) {
    --- End diff --
    
    Hmm. I wonder if this can be a problem. Given the following timeline:
    
        
        Time ->
    
        (1)--------(2)--------(3)
        
        (4)--------------(5)
        
    1: task 1 start
    2. task 1 asks for permission to commit, it's granted
    3. task 1 fails to commit
    4. task 2 starts (doing same work as task 1)
    5. task 2 asks for permission to commit, it's denied
    
    Wouldn't this code force a new task to be run to recompute everything? 
Also, wouldn't task 2 actually report itself as successful, and break things, 
since there is a successful task for that particular split, but it was never 
committed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to