Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7888#discussion_r44443194
  
    --- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -509,6 +511,13 @@ private[spark] class ExecutorAllocationManager(
       private def onExecutorBusy(executorId: String): Unit = synchronized {
         logDebug(s"Clearing idle timer for $executorId because it is now 
running a task")
         removeTimes.remove(executorId)
    +
    +    // Executor is added to remove by misjudgment due to async listener 
making it as idle).
    +    // see SPARK-9552
    +    if (executorsPendingToRemove.contains(executorId)) {
    --- End diff --
    
    While I agree with what you say, the current return value is both not very 
useful and really not in line with what the documentation says. It basically 
means "a message was sent to the cluster manager asking the executors to be 
killed". It doesn't mean the cluster manager received the message nor whether 
it successfully acted on it.
    
    So IMO it should be fine to change the meaning of the return value of 
`killExecutor` slightly; it would make the return value slightly more useful.
    
    Also, that makes me question whether your current code really works. If the 
executor ID is in the `executorsPendingToRemove` list, it means a request to 
kill that executor has already been sent to the cluster manager. Meaning that 
even if you remove the executor from this list, the cluster manager will still 
kill it. Which makes my suggestion of not sending the kill request even more 
important.
    
    I see what the race is, but once the request is sent to the cluster 
manager, it's too late to try to fix things. So the only enhancement I see is 
if you're able to avoid sending the request in the first place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to