[GitHub] [spark] squito commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

GitBox Tue, 02 Jul 2019 11:35:27 -0700

squito commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : 
Optimize Spark Scheduler to dequeue speculative tasks…
URL: https://github.com/apache/spark/pull/23677#discussion_r299626736


 ##########
 File path: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
 ##########
 @@ -1655,4 +1657,81 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
     // get removed inside TaskSchedulerImpl later.
     assert(availableResources(GPU) sameElements Array("0", "1", "2", "3"))
   }
+
+  test("SPARK-26755 Ensure that a speculative task is submitted only once for 
execution") {
+    sc = new SparkContext("local", "test")
+    sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
+    val taskSet = FakeTask.createTaskSet(4)
+    // Set the speculation multiplier to be 0 so speculative tasks are 
launched immediately
+    sc.conf.set(config.SPECULATION_MULTIPLIER, 0.0)
+    sc.conf.set(config.SPECULATION_ENABLED, true)
+    sc.conf.set(config.SPECULATION_QUANTILE, 0.5)
+    val clock = new ManualClock()
+    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES, clock 
= clock)
+    val accumUpdatesByTask: Array[Seq[AccumulatorV2[_, _]]] = 
taskSet.tasks.map { task =>
+      task.metrics.internalAccums
+    }
+    // Offer resources for 4 tasks to start
+    for ((k, v) <- List(
+      "exec1" -> "host1",
+      "exec1" -> "host1",
+      "exec2" -> "host2",
+      "exec2" -> "host2")) {
+      val taskOption = manager.resourceOffer(k, v, NO_PREF)
+      assert(taskOption.isDefined)
+      val task = taskOption.get
+      assert(task.executorId === k)
+    }
+    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
+    clock.advance(1)
+    // Complete the first 2 tasks and leave the other 2 tasks in running
+    for (id <- Set(0, 1)) {
+      manager.handleSuccessfulTask(id, createTaskResult(id, 
accumUpdatesByTask(id)))
+      assert(sched.endedTasks(id) === Success)
+    }
+    // checkSpeculatableTasks checks that the task runtime is greater than the 
threshold for
+    // speculating. Since we use a threshold of 0 for speculation, tasks need 
to be running for
+    // > 0ms, so advance the clock by 1ms here.
+    clock.advance(1)
+    assert(manager.checkSpeculatableTasks(0))
+    assert(sched.speculativeTasks.toSet === Set(2, 3))
+    assert(manager.pendingSpeculatableTasks.forExecutor.size === 0)
+    assert(manager.pendingSpeculatableTasks.forHost.size === 0)
+    assert(manager.pendingSpeculatableTasks.forRack.size === 0)
+    assert(manager.pendingSpeculatableTasks.anyPrefs.size === 2)
+    assert(manager.pendingSpeculatableTasks.noPrefs.size === 2)
+
+    // Offer resource to start the speculative attempt for the running task
+    val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF)
+    val taskOption6 = manager.resourceOffer("exec1", "host1", NO_PREF)
+    assert(taskOption5.isDefined)
+    val task5 = taskOption5.get
+    assert(task5.index === 2)
+    assert(task5.taskId === 4)
+    assert(task5.executorId === "exec1")
+    assert(task5.attemptNumber === 1)
+    assert(taskOption6.isDefined)
+    val task6 = taskOption6.get
+    assert(task6.index === 3)
+    assert(task6.taskId === 5)
+    assert(task6.executorId === "exec1")
+    assert(task6.attemptNumber === 1)
+    sched.initialize(new FakeSchedulerBackend() {
+      override def killTask(
+        taskId: Long,
+        executorId: String,
+        interruptThread: Boolean,
+        reason: String): Unit = {}
+    })
+    clock.advance(1)
+    // Running checkSpeculatableTasks again should return false
+    assert(!manager.checkSpeculatableTasks(0))
+    assert(manager.pendingSpeculatableTasks.forExecutor.size === 0)
+    assert(manager.pendingSpeculatableTasks.forHost.size === 0)
+    assert(manager.pendingSpeculatableTasks.forRack.size === 0)
+    // allPendingSpeculativeTasks will still have two pending tasks but
+    // pendingSpeculatableTasksWithNoPrefs should have none
+    assert(manager.pendingSpeculatableTasks.anyPrefs.size === 2)
+    assert(manager.pendingSpeculatableTasks.noPrefs.size === 0)
 
 Review comment:
   these structures are actually an implementation detail, in particular since 
they leave some "stale" state and are meant to lazily correct themselves.  So I 
wouldn't test their size at all.
   
   However, here you *should* test what happens if you have another resource 
offer, with `ANY` locality, and make sure you don't schedule two tasks on them. 
 Furthermore, you should probably originally give your tasks some locality 
preferences, and then make offers with some meeting the locality preferences, 
and some not meeting the locality constraints.  Since you'll have the 
speculatable task listed in multiple places inside `pendingSpeculatableTasks`, 
we want to make sure we don't run multiple copies.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] squito commented on a change in pull request #23677: [SPARK-26755][SCHEDULER] : Optimize Spark Scheduler to dequeue speculative tasks…

Reply via email to