Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19848#discussion_r154157813
  
    --- Diff: 
core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala ---
    @@ -70,7 +79,14 @@ object SparkHadoopMapRedUtil extends Logging {
           if (shouldCoordinateWithDriver) {
             val outputCommitCoordinator = SparkEnv.get.outputCommitCoordinator
             val taskAttemptNumber = TaskContext.get().attemptNumber()
    -        val canCommit = outputCommitCoordinator.canCommit(jobId, splitId, 
taskAttemptNumber)
    +        var canCommit: Boolean = true
    +        // This checks whether the commitTask provided by stageId, which 
if not the canCommit
    +        // will use jobId as stageId to decide whether the commit should 
be possible
    +        if (stageId != -1) {
    --- End diff --
    
    In which case would this happen? Would it be hard to change the API so that 
the stage id is always provided to `commitTask`?
    
    Mridul suggested in the previous PR to use the MR job configuration to 
propagate this (which you can access in the `mrTaskContext` parameter above). 
Any reason why you didn't go that route?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to