Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/16901#discussion_r100886876
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2161,6 +2161,48 @@ class DAGSchedulerSuite extends SparkFunSuite with
LocalSparkContext with Timeou
}
}
+ test("After fetching failed, success of old attempt of stage should be
taken as valid.") {
+ val rddA = new MyRDD(sc, 2, Nil)
+ val shuffleDepA = new ShuffleDependency(rddA, new HashPartitioner(2))
+ val shuffleIdA = shuffleDepA.shuffleId
+
+ val rddB = new MyRDD(sc, 2, List(shuffleDepA))
+ val shuffleDepB = new ShuffleDependency(rddB, new HashPartitioner(2))
+
+ val rddC = new MyRDD(sc, 2, List(shuffleDepB))
+
+ submit(rddC, Array(0, 1))
+ assert(taskSets(0).stageId === 0 && taskSets(0).stageAttemptId === 0)
+
+ complete(taskSets(0), Seq(
+ (Success, makeMapStatus("hostA", 2)),
+ (Success, makeMapStatus("hostA", 2))))
+
+ // Fetch failed on hostA for task(partitionId=0) and success on hostB
for task(partitionId=1)
+ complete(taskSets(1), Seq(
+ (FetchFailed(makeBlockManagerId("hostA"), shuffleIdA, 0, 0,
+ "Fetch failure of task: stageId=1, stageAttempt=0,
partitionId=0"), null),
+ (Success, makeMapStatus("hostB", 2))))
+
+ scheduler.resubmitFailedStages()
+ assert(taskSets(2).stageId === 0 && taskSets(2).stageAttemptId === 1)
+ complete(taskSets(2), Seq(
+ (Success, makeMapStatus("hostB", 2)),
+ (Success, makeMapStatus("hostB", 2))))
+
+ assert(taskSets(3).stageId === 1 && taskSets(2).stageAttemptId === 1)
+ runEvent(makeCompletionEvent(
+ taskSets(3).tasks(0), Success, makeMapStatus("hostB", 2)))
+
+ // Thanks to the success from old attempt of stage(stageId=1), there's
no pending
--- End diff --
It looks like the success above is from the newer attempt of the stage
(since you're taking the task from taskSets(3), not taskSets(1)), which is
inconsistent with the comment. I think perhaps the intention here was to *not*
finish one of the tasks from taskSets(1) in the first time around (i.e.,
eliminate the Success on line 2185)) and then move that success here (instead
of completing the task from the more recent task set)?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]