squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late 
zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r250354836
 
 

 ##########
 File path: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
 ##########
 @@ -2851,6 +2862,40 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
     }
   }
 
+  test("SPARK-25250: Late zombie task completions handled correctly even 
before" +
+    " new taskset launched") {
+    val shuffleMapRdd = new MyRDD(sc, 4, Nil)
+    val shuffleDep = new ShuffleDependency(shuffleMapRdd, new 
HashPartitioner(4))
+    val reduceRdd = new MyRDD(sc, 4, List(shuffleDep), tracker = 
mapOutputTracker)
+    submit(reduceRdd, Array(0, 1, 2, 3))
+
+    completeShuffleMapStageSuccessfully(0, 0, numShufflePartitions = 4)
+
+    // Fail Stage 1 Attempt 0 with Fetch Failure
+    runEvent(makeCompletionEvent(
+      taskSets(1).tasks(0),
+      FetchFailed(makeBlockManagerId("hostA"), shuffleDep.shuffleId, 0, 0, 
"ignored"),
+      null))
+
+    // this will trigger a resubmission of stage 0, since we've lost some of 
its
+    // map output, for the next iteration through the loop
+    scheduler.resubmitFailedStages()
+    completeShuffleMapStageSuccessfully(0, 1, numShufflePartitions = 4)
+
+    runEvent(makeCompletionEvent(
+      taskSets(1).tasks(3), Success, Nil, Nil))
+    assert(completedPartitions.get(taskSets(3).stageId).get.contains(
+      taskSets(3).tasks(1).partitionId) == false, "Corresponding partition id 
for" +
+      " stage 1 attempt 1 is not complete yet")
+
+    // this will mark partition id 1 of stage 1 attempt 0 as complete. So we 
expect the status
+    // of that partition id to be reflected for stage 1 attempt 1 as well.
+    runEvent(makeCompletionEvent(
+      taskSets(1).tasks(1), Success, Nil, Nil))
+    assert(completedPartitions.get(taskSets(3).stageId).get.contains(
+      taskSets(3).tasks(1).partitionId) == true)
 
 Review comment:
   I was originally thinking that we'd get rid of the test from 
https://github.com/apache/spark/pull/21131 in TaskSchedulerImplSuite, but now I 
see it tests a bunch of stuff in TaskSchedulerImpl and TSM which get mocked 
here, so I guess we need to keep it.  It might be worthwhile at least adding a 
comment here mentioning that this goes along with "Completions in zombie 
tasksets update status of non-zombie taskset" in TaskSchedulerImplSuite.
   
   This isn't really checking that the update happens inside the event loop ... 
but I guess thats OK, I dunno if its really worth trying to test that exactly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to