Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16650#discussion_r99595765
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
    @@ -600,6 +603,16 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
        */
       protected def doKillExecutors(executorIds: Seq[String]): Future[Boolean] 
=
         Future.successful(false)
    +
    +  /**
    +   * Request that the cluster manager kill all executors on a given host.
    +   * @return whether the kill request is acknowledged.
    +   */
    +  final override def killExecutorsOnHost(host: String): Boolean = {
    +    logInfo(s"Requesting to kill any and all executors on host ${host}")
    +    driverEndpoint.send(KillExecutorsOnHost(host))
    --- End diff --
    
    I think it would be helpful to add a comment here explaining why do this in 
the event loop, rather than killing immediatley, eg.
    
    We might have an executor register on the bad host, at the same time that 
we try to kill all executors on that host.  By killing all executors within the 
event loop (rather than in the calling thread), we ensure that the registering 
and killing of executors is serialized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to