Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22209#discussion_r212772176
  
    --- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
    @@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
             val e = it.next()
             if (job.stageIds.contains(e.getKey()._1)) {
               val stage = e.getValue()
    -          stage.status = v1.StageStatus.SKIPPED
    -          job.skippedStages += stage.info.stageId
    -          job.skippedTasks += stage.info.numTasks
    -          it.remove()
    -          update(stage, now)
    +          // Only update the stage if it has not finished already
    +          if (v1.StageStatus.ACTIVE.equals(stage.status) ||
    +              v1.StageStatus.PENDING.equals(stage.status)) {
    +            stage.status = v1.StageStatus.SKIPPED
    +            job.skippedStages += stage.info.stageId
    +            job.skippedTasks += stage.info.numTasks
    +            job.activeStages -= 1
    +            it.remove()
    --- End diff --
    
    To clarify, I was referring to 'this' being job end event received before 
stage end (for a stage which is part of a job).
    
    I was not referring to task end event's (those can come in after stage or 
job end's).
    
    Thanks for clarifying @vanzin ... given the snippet is not trying to 
recover from events drop, wondering why "non"-skipped stages would even be in 
the list : I would expect all of them to be skipped ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to