Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1566#discussion_r15330105
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
    @@ -753,11 +752,14 @@ class DAGScheduler(
           null
         }
     
    -    // must be run listener before possible NotSerializableException
    -    // should be "StageSubmitted" first and then "JobEnded"
    -    listenerBus.post(SparkListenerStageSubmitted(stageToInfos(stage), 
properties))
    -
         if (tasks.size > 0) {
    +      runningStages += stage
    +      // SparkListenerStageSubmitted should be posted before testing 
whether tasks are
    +      // serializable. If tasks are not serializable, a 
SparkListenerStageCompleted event
    +      // will be posted, which should always come after a corresponding 
SparkListenerStageSubmitted
    +      // event.
    +      listenerBus.post(SparkListenerStageSubmitted(stageToInfos(stage), 
properties))
    --- End diff --
    
    I also moved this event inside the check for tasks.size being > 0 -- 
because we shouldn't tell the UI/listeners about a stage if it doesn't have any 
tasks and therefore won't be run.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to