[GitHub] [spark] Ngone51 commented on a change in pull request #32766: [SPARK-35627][CORE] Decommission executors in batches to not overload network bandwidth

GitBox Sun, 06 Jun 2021 23:38:21 -0700


Ngone51 commented on a change in pull request #32766:
URL: https://github.com/apache/spark/pull/32766#discussion_r646307720




##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -519,10 +558,7 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
     
scheduler.sc.env.blockManager.master.decommissionBlockManagers(executorsToDecommission)
 
     if (!triggeredByExecutor) {
-      executorsToDecommission.foreach { executorId =>
-        logInfo(s"Notify executor $executorId to decommissioning.")
-        executorDataMap(executorId).executorEndpoint.send(DecommissionExecutor)
-      }

Review comment:
       Of course, I know the network can suffer the overload when there's a 
large chunk of decommissioning executors at the same time.
   
   My question is, did you see the issue happen in a real cluster? AFAIK, for 
example, [the average frequency of spot instance interruption in 
AWS](https://aws.amazon.com/ec2/spot/instance-advisor/) is only 5%, which I 
think is a low probability that could lead to plenty of executors to 
decommission in a short time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Ngone51 commented on a change in pull request #32766: [SPARK-35627][CORE] Decommission executors in batches to not overload network bandwidth

Reply via email to