Re: [PR] [SPARK-50648][CORE] when the job is cancelled during shuffle retry in parent stage, might leave behind zombie running tasks [spark]

via GitHub Fri, 27 Dec 2024 04:20:44 -0800


yabola commented on code in PR #49270:
URL: https://github.com/apache/spark/pull/49270#discussion_r1898419220



##########
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##########
@@ -2248,7 +2250,7 @@ class DAGSchedulerSuite extends SparkFunSuite with 
TempLocalSparkContext with Ti
     // original result task 1.0 succeed
     runEvent(makeCompletionEvent(taskSets(1).tasks(1), Success, 42))
     sc.listenerBus.waitUntilEmpty()
-    assert(completedStage === List(0, 1, 1, 0))
+    assert(completedStage === List(0, 1, 1, 0, 1))

Review Comment:
   Let me explain the timeline of the last event  in this UT:
   1.  map stage is running, result stage is waiting (result task 1.0  is 
running)
   2. UT result task 1.0 success(no running tasks in  result stage any more) 
   3. in `handleTaskCompletion` ,  the result stage  `markStageAsFinished` and 
clean result stage's `failedAttemptIds`
   4.  `cancelRunningIndependentStages` cancel map stage (it is in running 
stage) . Result stage is waiting , but don't have `failedAttemptIds` , so it 
won't be killed in `cancelRunningIndependentStages` (and also no running tasks 
in result stage)
   
   In this UT, it is really no need to kill the last result stage. 
   
   In addition, the result stage will always definitely kill all tasks when 
success, we don't have to worry about this. please see here
   
https://github.com/apache/spark/blob/939129ec01af7f7b6dbec737f2d4149d2fc0d9a3/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1960



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50648][CORE] when the job is cancelled during shuffle retry in parent stage, might leave behind zombie running tasks [spark]

Reply via email to