Ngone51 commented on code in PR #39280:
URL: https://github.com/apache/spark/pull/39280#discussion_r1071230700
##########
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala:
##########
@@ -534,11 +549,14 @@ class CoarseGrainedSchedulerBackend(scheduler:
TaskSchedulerImpl, val rpcEnv: Rp
// Do not change this code without running the K8s integration suites
val executorsToDecommission = executorsAndDecomInfo.flatMap { case
(executorId, decomInfo) =>
// Only bother decommissioning executors which are alive.
+ // Keep executor decommission info in case executor started, but not
registered yet
if (isExecutorActive(executorId)) {
scheduler.executorDecommission(executorId, decomInfo)
executorsPendingDecommission(executorId) = decomInfo
Some(executorId)
} else {
+ unKnownExecutorsPendingDecommission.put(executorId,
Review Comment:
My concern is actually the race condition between "executor killed by
driver" and "executor decommissioned". If "executor killed by driver" happens
right before "executor decommissioned", then we'd mistakenly put the executor
into `unKnownExecutorsPendingDecommission`. But yes, the cache is bounded as
well as this only a race condition, so I think it should be fine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]