Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16620#discussion_r99616688
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
    @@ -1191,8 +1191,29 @@ class DAGScheduler(
                 } else {
                   shuffleStage.addOutputLoc(smt.partitionId, status)
                 }
    -
                 if (runningStages.contains(shuffleStage) && 
shuffleStage.pendingPartitions.isEmpty) {
    +              // Check if there is active TaskSetManager.
    +              val activeTaskSetManagerOpt = 
Option(taskScheduler.rootPool).flatMap { rootPool =>
    +                rootPool.getSortedTaskSetQueue.find { tsm =>
    +                  tsm.stageId == stageId && !tsm.isZombie
    +                }
    +              }
    +              activeTaskSetManagerOpt.foreach { activeTsm =>
    +                // The scheduler thinks we don't need any more partitions 
for this stage, but there
    +                // is still an active taskset for the stage.  This can 
happen when there are stage
    +                // retries, and we get late task completions from earlier 
stages.  Note that all of
    +                // the map output may or may not be available -- some of 
those map outputs may have
    +                // been lost.  But the most consistent way to make that 
determination is to end
    +                // the running taskset, and mark the stage as finished.  
The DAGScheduler will
    +                // automatically determine whether there are still 
partitions missing that need to
    +                // be resubmitted.
    +                // NOTE: this will get a lock on the TaskScheduler
    --- End diff --
    
    Sorry this is my fault -- I gave you a bad comment here.  Can we reword 
that line to
    
    // We need a lock on the taskScheduler because tsm is not thread-safe, it 
assumes that all interactions have a lock on the taskScheduler, even just 
setting isZombie.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to