Kay Ousterhout created SPARK-19538:
--------------------------------------

             Summary: DAGScheduler and TaskSetManager can have an inconsistent 
view of whether a stage is complete.
                 Key: SPARK-19538
                 URL: https://issues.apache.org/jira/browse/SPARK-19538
             Project: Spark
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 2.1.0
            Reporter: Kay Ousterhout
            Assignee: Kay Ousterhout


The pendingPartitions in Stage tracks partitions that still need to be 
computed, and is used by the DAGScheduler to determine when to mark the stage 
as complete.  In most cases, this variable is exactly consistent with the tasks 
in the TaskSetManager (for the current version of the stage) that are still 
pending.  However, as discussed in SPARK-19263, these can become inconsistent 
when an ShuffleMapTask for an earlier attempt of the stage completes, in which 
case the DAGScheduler may think the stage has finished, while the 
TaskSetManager is still waiting for some tasks to complete (see the description 
in this pull request: https://github.com/apache/spark/pull/16620).  This leads 
to bugs like SPARK-19263.  Another problem with this behavior is that listeners 
can get two StageCompleted messages: once when the DAGScheduler thinks the 
stage is complete, and a second when the TaskSetManager later decides the stage 
is complete.  We should fix this.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to