squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late 
zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r251912823
 
 

 ##########
 File path: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
 ##########
 @@ -2849,6 +2862,53 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
     }
   }
 
+  // This test is kind of similar and goes alongwith "Completions in zombie 
tasksets update
+  // status of non-zombie taskset" in TaskSchedulerImplSuite.scala.
+  test("SPARK-25250: Late zombie task completions handled correctly even 
before" +
+    " new taskset launched") {
+    val shuffleMapRdd = new MyRDD(sc, 4, Nil)
+    val shuffleDep = new ShuffleDependency(shuffleMapRdd, new 
HashPartitioner(4))
+    val reduceRdd = new MyRDD(sc, 4, List(shuffleDep), tracker = 
mapOutputTracker)
+    submit(reduceRdd, Array(0, 1, 2, 3))
+
+    completeShuffleMapStageSuccessfully(0, 0, numShufflePartitions = 4)
+
+    // Fail Stage 1 Attempt 0 with Fetch Failure
+    runEvent(makeCompletionEvent(
+      taskSets(1).tasks(0),
+      FetchFailed(makeBlockManagerId("hostA"), shuffleDep.shuffleId, 0, 0, 
"ignored"),
+      null))
+
+    // this will trigger a resubmission of stage 0, since we've lost some of 
its
+    // map output, for the next iteration through the loop
+    scheduler.resubmitFailedStages()
+    completeShuffleMapStageSuccessfully(0, 1, numShufflePartitions = 4)
+
+    // tasksets 1 & 3 should be two different attempts for our reduce stage -- 
lets
+    // double-check test setup
+    val reduceStage = taskSets(1).stageId
+    assert(taskSets(3).stageId === reduceStage)
+
+    // complete one task from the original taskset, make sure we update the 
taskSchedulerImpl
+    // so it can notify all taskSetManagers. Some of that is mocked here, just 
check there
+    // is the right event.
+    val taskToComplete = taskSets(1).tasks(3)
+
+    runEvent(makeCompletionEvent(taskToComplete, Success, Nil, Nil))
+    assert(completedPartitions.getOrElse(reduceStage, Set()) === 
Set(taskToComplete.partitionId))
+
+    assert(completedPartitions.get(taskSets(3).stageId).get.contains(
+      taskSets(3).tasks(1).partitionId) == false, "Corresponding partition id 
for" +
+      " stage 1 attempt 1 is not complete yet")
+
+    // this will mark partition id 1 of stage 1 attempt 0 as complete. So we 
expect the status
+    // of that partition id to be reflected for stage 1 attempt 1 as well.
+    runEvent(makeCompletionEvent(
+      taskSets(1).tasks(1), Success, Nil, Nil))
+    assert(completedPartitions.get(taskSets(3).stageId).get.contains(
+      taskSets(3).tasks(1).partitionId) == true)
 
 Review comment:
   again, I'd just check the full Set here, rather than the existence of one 
particular partition.  You've already checked the stage Ids are the same with 
both taskSets.  So can be
   
   ```scala
   assert(completedPartitions(reduceStage) === 
Set(taskSets(1).tasks(1).partitionId))
   ```
   
   I know these changes might make it seem like we're not testing interaction 
between the taskSets, but really that was the case before as well, this just 
makes it easier to follow.  That's kinda what bugs me about how the whole 
TaskSchedulerImpl part gets mocked out here, but anyway we can set that aside 
for now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to