Ngone51 commented on code in PR #37924:
URL: https://github.com/apache/spark/pull/37924#discussion_r990925620


##########
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##########
@@ -2159,6 +2176,26 @@ private[spark] class DAGScheduler(
     }
   }
 
+  /**
+   * Whether executor is decommissioning or decommissioned.
+   * Return true when:
+   *  1. Waiting for decommission start
+   *  2. Under decommission process
+   * Return false when:
+   *  1. Stopped or terminated after finishing decommission
+   *  2. Under decommission process, then removed by driver with other reasons
+   */
+  private[scheduler] def isExecutorDecommissioningOrDecommissioned(
+      taskScheduler: TaskScheduler, bmAddress: BlockManagerId): Boolean = {
+    if (bmAddress != null) {
+      taskScheduler
+        .getExecutorDecommissionState(bmAddress.executorId)

Review Comment:
   Hi @warrenzhu25 The general idea looks good to me. Just have one concern 
here: I think `getExecutorDecommissionState` could return empty (which means 
the executor decommission has finished) in most cases if we hit shuffle fetch 
failures. The shuffle data is not cleaned up during the executor decommission 
but just be duplicated in another executor. So, normally, the reduce stage 
should still fetch shuffle data from the decommissioning executor successfully. 
Fetch failures should only happen when the executor exits itself due to network 
connection interruption. In that case, the executor could be already removed 
from the `TaskSchedulerImpl.executorsPendingDecommission` on the executor lost 
event, which would result in `getExecutorDecommissionState` returns empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to