Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/5536#issuecomment-94924265
Let's take a step back and summarize the problem and the proposed solution
to make sure that we're all on the same page. This issue is actually fairly
complicated so I think it's worth it to explore this in some detail. I would be
happy to hear your thoughts about this. @srowen @piaozhexiu @sryza
------------
**Symptom.** There are instances where `ExecutorAllocationManager` (EAM)
requests a negative total number of executors from the `YarnAllocator`, and
this results in an exception because the the code correctly does not allow
this. I think we all agree on this part. Now let's trace through an example of
how we can run into this exception.
**Problem illustrated with an example.**
(0) Let's say we start with 10 executors with nothing pending to be added
or removed
(1) Some tasks have finished such that the scheduler tells us we only need
5.
(2) Next time we call `schedule` we attempt to cancel 5 pending requests
(but there are none)
(3) We set `numExecutorsPending = newTarget - executorIds.size + [...] = 5
- 10 + 0 = -5`
(4) Next time we add executors our new target will be
```
// Let's say `numExecutorsToAdd == 1`
val currentTarget = numExecutorsPending + executorIds.size - [...] = -5 +
10 - 0 = 5
val newTarget = math.min(currentTarget + numExecutorsToAdd, [...]) = 5 + 1
= 6
```
(5) Thus, we only request 6 executors when we're trying to add, even though
we already have 10.
Now imagine between steps (3) and (4), a number of executors fail (or are
removed, the distinction is not important). For instance, if all of the
existing executors fail, *then we will end up with a negative target* (-4 in
this case, computed using the same equation above), resulting in the exception.
This is consistent with the description in the
[JIRA](https://issues.apache.org/jira/browse/SPARK-6954) that this only happens
if we remove executors too quickly.
**Underlying cause.** After a cancel, the EAM unconditionally adjusts the
new target downwards whether or not there are pending requests to be canceled,
and then computes `numExecutorsPending` using this new target. If there are no
pending executors, then the allocator actually *ignores* the cancel request
because there is nothing to cancel. Meanwhile, the EAM expects the new total
number of executors to go down as well, but this never happens.
In other words, when we cancel, we should only adjust the new target
downwards by at most `numExecutorsPending`, since the allocator will only
cancel any pending requests but not kill existing executors.
**Proposed solution.** This patch ensures that `numExecutorsPending` never
goes below 0. This means the next time EAM tries to add executors, it will use
a target total that is at least the number of currently registered executors,
and this must be >= 0.
This behavior also matches the state on the allocator side in most cases.
When it receives a cancel request, the allocator will cancel at most the number
of pending requests that are outstanding (clearly this cannot be negative).
Note that even this solution is not perfect. There exists a race condition
in which a few of the pending requests we are trying to cancel may already be
granted by the time the cancel request reaches the allocator. In this case, the
EAM will get more executors than is needed, but that is fine because they will
eventually be removed anyway. It's worth noting that this whole cancel feature
is best-effort in the first place, so it's OK for the cancel to not always take
effect, but it's not OK for the cancel to cause the add to fail.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]