Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10951#discussion_r55767105
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
    @@ -1144,13 +1144,12 @@ class DAGScheduler(
             null
           }
     
    -    // The success case is dealt with separately below.
    -    // TODO: Why post it only for failed tasks in cancelled stages? 
Clarify semantics here.
    -    if (event.reason != Success) {
    -      val attemptId = task.stageAttemptId
    -      listenerBus.post(SparkListenerTaskEnd(
    -        stageId, attemptId, taskType, event.reason, event.taskInfo, 
taskMetrics))
    -    }
    +    // Note: this stage may already have been canceled, in which case this 
task end event
    +    // maybe posted after the stage completed event. There's not much we 
can do here without
    +    // introducing additional complexity in the scheduler to wait for all 
the task end events
    +    // before posting the stage completed event.
    --- End diff --
    
    I feel like this comment does not fully explain why you want this.  Also 
the part at the end makes it sound like it would be better if there was a 
larger refactoring, so there was no stage end until we heard back from all the 
tasks -- but I don't even think we'd want that, especially with speculation.  
How about something like
    
    The stage may have already finished when we get this event -- eg. maybe it 
was a speculative task.  Its important that we send the TaskEnd event in any 
case, so listeners know how many tasks executors are currently running.  In 
particular, its important for DynamicAllocation to know if an executor is busy, 
and its also needed for the UI to update correctly.  See SPARK-11701 / 
SPARK-13054


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to