Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/21976#discussion_r213056190 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2474,19 +2478,21 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi runEvent(makeCompletionEvent( taskSets(3).tasks(0), Success, makeMapStatus("hostB", 2))) - // There should be no new attempt of stage submitted, - // because task(stageId=1, stageAttempt=1, partitionId=1) is still running in - // the current attempt (and hasn't completed successfully in any earlier attempts). - assert(taskSets.size === 4) + // At this point there should be no active task set for stageId=1 and we need + // to resubmit because the output from (stageId=1, stageAttemptId=0, partitionId=1) + // was ignored due to executor failure + assert(taskSets.size === 5) + assert(taskSets(4).stageId === 1 && taskSets(4).stageAttemptId === 2 + && taskSets(4).tasks.size === 1) - // Complete task(stageId=1, stageAttempt=1, partitionId=1) successfully. + // Complete task(stageId=1, stageAttempt=2, partitionId=1) successfully. runEvent(makeCompletionEvent( - taskSets(3).tasks(1), Success, makeMapStatus("hostB", 2))) + taskSets(4).tasks(0), Success, makeMapStatus("hostB", 2))) --- End diff -- yes it will, marking either of these successful will work, but the assumption on line 2469 is that it got marked completed there by the tasksetmanager. So we don't want to send success for taskSet(3).task(1) as it should have already been marked success Unfortunately you can't test the interactions in this unit test, that is why I'm working on another scheduler integration test but was going to do that under separate jira.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org