Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19048
  
    I think I see what you're saying. But I still think it's the fault of the 
EAM.
    
    > But please note that while killing 2 executors the EAM did not reduce its 
target to 3, it is still 5.
    
    And I think the problem here is that the EAM should not be telling the CGSB 
that the target is 5 when 5 is actually the "minimum" the EAM wants, but there 
may be more executors running that haven't timed out yet. Basically, this code 
in the EAM:
    
    ```
          if (numExecutorsTarget < oldNumExecutorsTarget) {
            client.requestTotalExecutors(numExecutorsTarget, 
localityAwareTasks, hostToLocalTaskCount)
            logDebug(s"Lowering target number of executors to 
$numExecutorsTarget (previously " +
              s"$oldNumExecutorsTarget) because not all requested executors are 
actually needed")
          }
    ```
    
    Should be changed to account for the current number of executors, so that 
the EAM doesn't tell the CGSB that it wants less executors than currently 
exist. Because even if the EAM may not currently "need" the extra executors, it 
hasn't timed them out, so they need to be counted towards the "number of 
executors that I expect to be active".
    
    Your solution (the new `updateTotalExecutor`) looks too much like the 
existing `replace` parameter, and it's a little confusing if you try to think 
about how to use both. What does it mean to ask for `updateTotalExecutor = 
false` and `replace  = false`? The latter means you want the executor count to 
go down, while the former means you don't.
    
    Now if the EAM tells the CGSB the correct amount of executors it expects to 
be active (which means something like `max(executors I need, active 
executors)`) then the problem should go away, no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to