holdenk commented on a change in pull request #29452:
URL: https://github.com/apache/spark/pull/29452#discussion_r474985897



##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##########
@@ -1123,14 +1127,6 @@ private[spark] class TaskSetManager(
 
   def executorDecommission(execId: String): Unit = {
     recomputeLocality()
-    if (speculationEnabled) {

Review comment:
       Do we have a test that this is being speculated?

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -926,18 +926,21 @@ private[spark] class TaskSchedulerImpl(
         // and some of those can have isHostDecommissioned false. We merge 
them such that
         // if we heard isHostDecommissioned ever true, then we keep that one 
since it is
         // most likely coming from the cluster manager and thus authoritative
-        val oldDecomInfo = executorsPendingDecommission.get(executorId)
-        if (!oldDecomInfo.exists(_.isHostDecommissioned)) {
-          executorsPendingDecommission(executorId) = decommissionInfo
+        val oldDecomState = executorsPendingDecommission.get(executorId)
+        if (!oldDecomState.exists(_.isHostDecommissioned)) {
+          executorsPendingDecommission(executorId) = ExecutorDecommissionState(
+            decommissionInfo.message,
+            oldDecomState.map(_.startTime).getOrElse(clock.getTimeMillis()),

Review comment:
       Add a comment on this logic.

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionInfo.scala
##########
@@ -18,11 +18,22 @@
 package org.apache.spark.scheduler
 
 /**
- * Provides more detail when an executor is being decommissioned.
+ * Message providing more detail when an executor is being decommissioned.
  * @param message Human readable reason for why the decommissioning is 
happening.
  * @param isHostDecommissioned Whether the host (aka the `node` or `worker` in 
other places) is
  *                             being decommissioned too. Used to infer if the 
shuffle data might
  *                             be lost even if the external shuffle service is 
enabled.
  */
 private[spark]
 case class ExecutorDecommissionInfo(message: String, isHostDecommissioned: 
Boolean)
+
+/**
+ * State related to decommissioning that is kept by the TaskSchedulerImpl. 
This state is derived
+ * from the info message above but it is kept distinct to allow the state to 
evolve independently
+ * from the message.
+ */
+case class ExecutorDecommissionState(
+    message: String,

Review comment:
       Do we need the message in the state one?

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionInfo.scala
##########
@@ -18,11 +18,22 @@
 package org.apache.spark.scheduler
 
 /**
- * Provides more detail when an executor is being decommissioned.
+ * Message providing more detail when an executor is being decommissioned.
  * @param message Human readable reason for why the decommissioning is 
happening.
  * @param isHostDecommissioned Whether the host (aka the `node` or `worker` in 
other places) is
  *                             being decommissioned too. Used to infer if the 
shuffle data might
  *                             be lost even if the external shuffle service is 
enabled.
  */
 private[spark]
 case class ExecutorDecommissionInfo(message: String, isHostDecommissioned: 
Boolean)
+
+/**
+ * State related to decommissioning that is kept by the TaskSchedulerImpl. 
This state is derived
+ * from the info message above but it is kept distinct to allow the state to 
evolve independently
+ * from the message.
+ */
+case class ExecutorDecommissionState(
+    message: String,
+    // Timestamp the decommissioning commenced in millis since epoch of the 
driver's clock

Review comment:
       Maybe mention what it's used for? I know on my first skim through I 
thought this was for tracking expirery but it looks like it's being used for 
more than that. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to