[GitHub] [spark] Ngone51 commented on a change in pull request #31249: [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes

GitBox Mon, 25 Jan 2021 20:26:28 -0800


Ngone51 commented on a change in pull request #31249:
URL: https://github.com/apache/spark/pull/31249#discussion_r563517094




##########
File path: core/src/main/scala/org/apache/spark/scheduler/HealthTracker.scala
##########
@@ -184,8 +191,14 @@ private[scheduler] class HealthTracker (
         case Some(a) =>
           logInfo(s"Killing all executors on excluded host $node " +
             s"since ${config.EXCLUDE_ON_FAILURE_KILL_ENABLED.key} is set.")
-          if (a.killExecutorsOnHost(node) == false) {
-            logError(s"Killing executors on node $node failed.")
+          if (decommission) {
+            if (a.decommissionExecutorsOnHost(node) == false) {
+              logError(s"Decommissioning executors on $node failed.")
+            }
+          } else {
+            if (a.killExecutorsOnHost(node) == false) {

Review comment:
       ditto

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -176,11 +181,21 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
         }
 
       case KillExecutorsOnHost(host) =>
-        scheduler.getExecutorsAliveOnHost(host).foreach { exec =>
-          killExecutors(exec.toSeq, adjustTargetNumExecutors = false, 
countFailures = false,
+        scheduler.getExecutorsAliveOnHost(host).foreach { execs =>
+          killExecutors(execs.toSeq, adjustTargetNumExecutors = false, 
countFailures = false,
             force = true)
         }
 
+      case DecommissionExecutorsOnHost(host) =>
+        val reason = ExecutorDecommissionInfo(s"Decommissioning all executors 
on $host.")
+        scheduler.getExecutorsAliveOnHost(host).foreach { execs =>
+          val execsWithReasons = execs.map { exec =>
+            (exec, reason)
+          }.toArray

Review comment:
       nit:
   ```suggestion
             val execsWithReasons = execs.map(exec => (exec, reason)).toArray
   
   ```

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -506,6 +521,20 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
       }
     }
 
+    conf.get(EXECUTOR_DECOMMISSION_CLEANUP_INTERVAL).map { cleanupInterval =>
+      val cleanupTask = new Runnable() {
+        override def run(): Unit = Utils.tryLogNonFatalError {
+          val stragglers = 
executorsToDecommission.filter(executorsPendingDecommission.contains(_))

Review comment:
       nit:
   ```suggestion
             val stragglers = 
executorsToDecommission.filter(executorsPendingDecommission.contains)
   ```

##########
File path: core/src/main/scala/org/apache/spark/scheduler/HealthTracker.scala
##########
@@ -184,8 +191,14 @@ private[scheduler] class HealthTracker (
         case Some(a) =>
           logInfo(s"Killing all executors on excluded host $node " +
             s"since ${config.EXCLUDE_ON_FAILURE_KILL_ENABLED.key} is set.")
-          if (a.killExecutorsOnHost(node) == false) {
-            logError(s"Killing executors on node $node failed.")
+          if (decommission) {
+            if (a.decommissionExecutorsOnHost(node) == false) {

Review comment:
       nit: 
   ```suggestion
               if (!a.decommissionExecutorsOnHost(node)) {
   ```

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -506,6 +521,20 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
       }
     }
 
+    conf.get(EXECUTOR_DECOMMISSION_CLEANUP_INTERVAL).map { cleanupInterval =>
+      val cleanupTask = new Runnable() {
+        override def run(): Unit = Utils.tryLogNonFatalError {
+          val stragglers = 
executorsToDecommission.filter(executorsPendingDecommission.contains(_))
+          if (stragglers.nonEmpty) {
+            logInfo(
+              s"${stragglers.toList} failed to decommission in 
${cleanupInterval}, killing.")

Review comment:
       nit: unnecessary brackets for `cleanupInterval`.

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -839,7 +869,6 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
         Future.successful (if (killSuccessful) executorsToKill else 
Seq.empty[String])
       )(ThreadUtils.sameThread)
     }
-

Review comment:
       Could you revert this unrelated change?

##########
File path: core/src/main/scala/org/apache/spark/scheduler/HealthTracker.scala
##########
@@ -40,6 +40,7 @@ import org.apache.spark.util.{Clock, SystemClock, Utils}
  *      stage, but still many failures over the entire application
  *  * "flaky" executors -- they don't fail every task, but are still faulty 
enough to merit
  *      excluding
+ *  * missing shuffle files -- may trigger fetch failures on health executors.

Review comment:
       nit: "healthy"?

##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -809,6 +809,12 @@ package object config {
       .booleanConf
       .createWithDefault(false)
 
+  private[spark] val EXCLUDE_ON_FAILURE_DECOMMISSION_ENABLED =
+    ConfigBuilder("spark.excludeOnFailure.decommissionExcludedExecutors")

Review comment:
       Seems like this only takes effect when 
`spark.excludeOnFailure.killExcludedExecutors=true`. If so, I think it's more 
reasonable to have the same namespace for this conf, e.g., 
`spark.excludeOnFailure.killExcludedExecutors.decommissionEnabled` or something 
else.
   
   And with the current implementation, the executor may not get killed if 
`EXECUTOR_DECOMMISSION_CLEANUP_INTERVAL` is not set, which then doesn't respect 
the `spark.excludeOnFailure.killExcludedExecutors=true` indeed. 
   
   Actually, I'm thinking that it might be better to have these two confs at 
the same level?, e.g.,
   
   ```scala
   if (EXCLUDE_ON_FAILURE_KILL_ENABLED) {
    // do kill
   } else if (EXCLUDE_ON_FAILURE_DECOMMISSION_ENABLED) {
    // do decommission
   }
   // else do nothing
   ```
   
   WDYT?

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -850,6 +882,22 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   protected def doKillExecutors(executorIds: Seq[String]): Future[Boolean] =
     Future.successful(false)
 
+  /**
+   * Request that the cluster manager decommission all executors on a given 
host.
+   * @return whether the decommission request is acknowledged.
+   */
+  final override def decommissionExecutorsOnHost(host: String): Boolean = {
+    logInfo(s"Requesting to kill any and all executors on host ${host}")

Review comment:
       also, "kill any executors" or "kill all executors" should be enough?

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -506,6 +521,20 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
       }
     }
 
+    conf.get(EXECUTOR_DECOMMISSION_CLEANUP_INTERVAL).map { cleanupInterval =>
+      val cleanupTask = new Runnable() {
+        override def run(): Unit = Utils.tryLogNonFatalError {
+          val stragglers = 
executorsToDecommission.filter(executorsPendingDecommission.contains(_))
+          if (stragglers.nonEmpty) {
+            logInfo(
+              s"${stragglers.toList} failed to decommission in 
${cleanupInterval}, killing.")

Review comment:
       also, this can fit in one line

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -506,6 +521,20 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
       }
     }
 
+    conf.get(EXECUTOR_DECOMMISSION_CLEANUP_INTERVAL).map { cleanupInterval =>
+      val cleanupTask = new Runnable() {
+        override def run(): Unit = Utils.tryLogNonFatalError {
+          val stragglers = 
executorsToDecommission.filter(executorsPendingDecommission.contains(_))

Review comment:
       Also, the task is executed in a separate thread so we should ensure the 
thread-safe on `executorsPendingDecommission`.

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -506,6 +521,20 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
       }
     }
 
+    conf.get(EXECUTOR_DECOMMISSION_CLEANUP_INTERVAL).map { cleanupInterval =>

Review comment:
       IIUC, the decommissioned executor is supposed to shutdown et the end 
whether by the external services or itself (when decommissioned is considered 
to be done), right?
   
   I understand this clean task is a guarantee for executor shutdown. But I 
think, for users, it's really hard to get a reasonable value of the conf here 
since they don't know what happens during decommissioning and what the 
approximate time it would take.
   
   Besides, in the case of decommission coming from the external services, I 
think we'd better respect the `spark.executor.decommission.killInterval` here. 
This would prevent us from killing the executor prematurely when 
`spark.executor.decommission.cleanupInterval` is shorter than 
`spark.executor.decommission.killInterval`.
   
   

##########
File path: 
core/src/test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala
##########
@@ -554,6 +554,51 @@ class HealthTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with Mock
     verify(allocationClientMock).killExecutorsOnHost("hostA")
   }
 
+  test("excluding decommission and kills executors when enabled") {
+    val allocationClientMock = mock[ExecutorAllocationClient]
+    healthTracker = new HealthTracker(listenerBusMock, conf, 
Some(allocationClientMock), clock)
+
+    // verify we decommission when configured
+    conf.set(config.EXCLUDE_ON_FAILURE_KILL_ENABLED, true)
+    conf.set(config.DECOMMISSION_ENABLED.key, "true")

Review comment:
       this conf is not required?

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -850,6 +882,22 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   protected def doKillExecutors(executorIds: Seq[String]): Future[Boolean] =
     Future.successful(false)
 
+  /**
+   * Request that the cluster manager decommission all executors on a given 
host.
+   * @return whether the decommission request is acknowledged.
+   */
+  final override def decommissionExecutorsOnHost(host: String): Boolean = {
+    logInfo(s"Requesting to kill any and all executors on host ${host}")

Review comment:
       +1 to change both

##########
File path: 
core/src/test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala
##########
@@ -554,6 +554,51 @@ class HealthTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with Mock
     verify(allocationClientMock).killExecutorsOnHost("hostA")
   }
 
+  test("excluding decommission and kills executors when enabled") {
+    val allocationClientMock = mock[ExecutorAllocationClient]
+    healthTracker = new HealthTracker(listenerBusMock, conf, 
Some(allocationClientMock), clock)

Review comment:
       unused initialization?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Ngone51 commented on a change in pull request #31249: [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes

Reply via email to