agrawaldevesh commented on a change in pull request #28818:
URL: https://github.com/apache/spark/pull/28818#discussion_r458381415
##########
File path: core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala
##########
@@ -81,6 +81,35 @@ private[spark] trait ExecutorAllocationClient {
countFailures: Boolean,
force: Boolean = false): Seq[String]
+ /**
+ * Request that the cluster manager decommission the specified executors.
+ * Default implementation delegates to kill, scheduler must override
+ * if it supports graceful decommissioning.
+ *
+ * @param executorIds identifiers of executors to decommission
+ * @param adjustTargetNumExecutors whether the target number of executors
will be adjusted down
+ * after these executors have been
decommissioned.
+ * @return the ids of the executors acknowledged by the cluster manager to
be removed.
+ */
+ def decommissionExecutors(
+ executorIds: Seq[String],
+ adjustTargetNumExecutors: Boolean): Seq[String]
+
+
+ /**
+ * Request that the cluster manager decommission the specified executors.
+ * Default implementation delegates to kill, scheduler must override
+ * if it supports graceful decommissioning.
+ *
+ * @param executorIds identifiers of executors to decommission
+ * @return whether the request is acknowledged by the cluster manager.
+ */
+ def decommissionExecutor(executorId: String): Boolean = {
+ val decommissionedExecutors = decommissionExecutors(Seq(executorId),
+ adjustTargetNumExecutors = true)
Review comment:
Hmm, Should we rename decommissionExecutor (singular) to
decommissionAndKillExecutor to reflect its purpose better ? It would be too
easy to confuse it with decommissionExecutors (on line 95 of this file which
allows to not replenish the target number of executors).
Do you want to make the change to the callers of decommissionExecutor in
this PR and switch them to `decommissioExecutors(Seq(executorId), false)`
instead. The ones I am most concerned about are:
- The handling of message DecommissionExecutor (both sync and async
variants) in CoarseGrainedSchedulerBackend
- StandaloneSchedulerBackend.executorDecommissioned
In both the above cases, I think we may not _always_ want replenishing. For
example, in the standalone case, when the Worker gets a SIGPWR -- do we want to
replenish the executors on the remaining workers (ie oversubscribe the
remaining workers) ? Similarly when an executor gets a SIGPWR, do we want to
put that load on the remaining executors ? I think the answer to both should be
NO unless we are doing a dynamic allocation.
Personally I am fine with any choice of naming here as long as the semantics
are not silently changed under the cover, as is the case presently.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]