Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/16620#discussion_r99616688 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1191,8 +1191,29 @@ class DAGScheduler( } else { shuffleStage.addOutputLoc(smt.partitionId, status) } - if (runningStages.contains(shuffleStage) && shuffleStage.pendingPartitions.isEmpty) { + // Check if there is active TaskSetManager. + val activeTaskSetManagerOpt = Option(taskScheduler.rootPool).flatMap { rootPool => + rootPool.getSortedTaskSetQueue.find { tsm => + tsm.stageId == stageId && !tsm.isZombie + } + } + activeTaskSetManagerOpt.foreach { activeTsm => + // The scheduler thinks we don't need any more partitions for this stage, but there + // is still an active taskset for the stage. This can happen when there are stage + // retries, and we get late task completions from earlier stages. Note that all of + // the map output may or may not be available -- some of those map outputs may have + // been lost. But the most consistent way to make that determination is to end + // the running taskset, and mark the stage as finished. The DAGScheduler will + // automatically determine whether there are still partitions missing that need to + // be resubmitted. + // NOTE: this will get a lock on the TaskScheduler --- End diff -- Sorry this is my fault -- I gave you a bad comment here. Can we reword that line to // We need a lock on the taskScheduler because tsm is not thread-safe, it assumes that all interactions have a lock on the taskScheduler, even just setting isZombie.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org