Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7888#discussion_r44493537
  
    --- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -509,6 +511,13 @@ private[spark] class ExecutorAllocationManager(
       private def onExecutorBusy(executorId: String): Unit = synchronized {
         logDebug(s"Clearing idle timer for $executorId because it is now 
running a task")
         removeTimes.remove(executorId)
    +
    +    // Executor is added to remove by misjudgment due to async listener 
making it as idle).
    +    // see SPARK-9552
    +    if (executorsPendingToRemove.contains(executorId)) {
    --- End diff --
    
    But I'm not talking about the general case. I'm talking about the very 
specific case of this code, where only a single executor is killed at a time, 
and a single kill request is sent (or not) to the cluster manager. Here you do 
not need to do any "rescuing" because you know whether the executor was killed, 
so you don't need to add it to that list in the first place.
    
    Just look in `removeExecutor`, it only ever calls `killExecutor`, not 
`killExecutor*s*`.
    
    It should be perfectly valid to change the return value of `killExecutor` 
to return whether that one executor was killed. `killExecutors` would return 
true if any executor in the list was killed, and false if none; if the list 
contains a single executor, then it all lines up.
    
    So I don't see the problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to