[GitHub] [spark] Ngone51 commented on a change in pull request #27017: [SPARK-30359][CORE] Don't clear executorsPendingToRemove at the beginning of CoarseGrainedSchedulerBackend.reset

GitBox Thu, 02 Jan 2020 04:56:31 -0800

Ngone51 commented on a change in pull request #27017: [SPARK-30359][CORE] Don't 
clear executorsPendingToRemove at the beginning of 
CoarseGrainedSchedulerBackend.reset 
URL: https://github.com/apache/spark/pull/27017#discussion_r362463687


 ##########
 File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
 ##########
 @@ -1894,4 +1903,59 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
     manager.handleFailedTask(offerResult.get.taskId, TaskState.FAILED, reason)
     assert(sched.taskSetsFailed.contains(taskSet.id))
   }
+
+  test("SPARK-30359: don't clean executorsPendingToRemove " +
+    "at the beginning of CoarseGrainedSchedulerBackend.reset") {
+    val conf = new SparkConf()
+      // use local-cluster mode in order to get CoarseGrainedSchedulerBackend
+      .setMaster("local-cluster[2, 1, 2048]")
+      // allow to set up at most two executors
+      .set("spark.cores.max", "2")
+      .setAppName("CoarseGrainedSchedulerBackend.reset")
+    sc = new SparkContext(conf)
+    val sched = sc.taskScheduler
+    val backend = 
sc.schedulerBackend.asInstanceOf[CoarseGrainedSchedulerBackend]
+
+    TestUtils.waitUntilExecutorsUp(sc, 2, 60000)
+    val Seq(exec0, exec1) = backend.getExecutorIds()
+
+    val taskSet = FakeTask.createTaskSet(2)
+    val stageId = taskSet.stageId
+    val stageAttemptId = taskSet.stageAttemptId
+    sched.submitTasks(taskSet)
+    val taskSetManagers = PrivateMethod[mutable.HashMap[Int, 
mutable.HashMap[Int, TaskSetManager]]](
+      Symbol("taskSetsByStageIdAndAttempt"))
+    // get the TaskSetManager
+    val manager = 
sched.invokePrivate(taskSetManagers()).get(stageId).get(stageAttemptId)
+
+    val task0 = manager.resourceOffer(exec0, "localhost", TaskLocality.NO_PREF)
+    val task1 = manager.resourceOffer(exec1, "localhost", TaskLocality.NO_PREF)
+    assert(task0.isDefined && task1.isDefined)
+    val (taskId0, index0) = (task0.get.taskId, task0.get.index)
+    val (taskId1, index1) = (task1.get.taskId, task1.get.index)
+    // set up two running tasks
+    assert(manager.taskInfos(taskId0).running)
+    assert(manager.taskInfos(taskId0).executorId === exec0)
+    assert(manager.taskInfos(taskId1).running)
+    assert(manager.taskInfos(taskId1).executorId === exec1)
+
+    val numFailures = PrivateMethod[Array[Int]](Symbol("numFailures"))
+    // no task failures yet
+    assert(manager.invokePrivate(numFailures())(index0) === 0)
+    assert(manager.invokePrivate(numFailures())(index1) === 0)
+
+    // task0 on exec0 should not count failures
+    backend.executorsPendingToRemove(exec0) = true
+    // task1 on exec1 should count failures
 
 Review comment:
   Here, `executorsPendingToRemove(exec0)=true` while 
`executorsPendingToRemove(exec1)=false`. And `false` means that the crash of 
executor may possibly related to bad tasks running on it. So, those task should 
be counted failures. However, `true` means the executor is killed by driver and 
has non business of tasks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Ngone51 commented on a change in pull request #27017: [SPARK-30359][CORE] Don't clear executorsPendingToRemove at the beginning of CoarseGrainedSchedulerBackend.reset

Reply via email to