Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5536#issuecomment-94592575
  
    I believe this is the right fix, but it's actually fairly non-trivial to 
determine the correctness of this change. When we request a new total that is 
less than the current total, `YarnAllocator` will lower the number of pending 
executor requests but *not kill existing executors* to get to that total. Thus, 
there is a divergence in the total expected by `ExecutorAllocationManager` and 
the `YarnAllocator`, and the `ExecutorAllocationManager` uses this new 
incorrect total to compute the number of pending executors, leading to a 
negative value.
    
    A simple of example: We have 10 executors with nothing pending to be added 
or removed. Then suddenly we finish running all tasks, such that the number of 
executors needed is now 0. Then, when we try to cancel, we will request a total 
of 0 executors. If we follow the math in `updateNumPendingExecutors`, we will 
get -10:
    ```
    // 0 - 10 + 0 = -10
    numExecutorsPending =
      newTotalExecutors - executorIds.size + executorsPendingToRemove.size
    ```
    which doesn't make sense because we can't have a negative number of pending 
executors.
    
    Why is this a correct fix then? Well, the only place where we ever request 
a total less than the current one is when we cancel outstanding requests, and 
when we cancel requests the `YarnAllocator` will only at most cancel all 
pending requests, but not kill existing executors. Limiting the 
`numPendingExecutors` to be at least 0 essentially causes the same behavior on 
the `ExecutorAllocationManager` side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to