Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5536#issuecomment-94610887
I've thought about this more and think I'm able to articulate why I think
the present fix is incorrect.
Aside from the weirdness of the idea that there can be a negative number of
pending executors, I think the behavior that you (Andrew) describe in your
example is not a problem. numPendingExecutors ends up as -10.
targetNumExecutors is then calculated as numPendingExecutors + executorIds.size
- executorsPendingRemoval.size, which comes out to 0. This is the correct
number that we'd like to pass on to YarnAllocator, which tells YarnAllocator
that if executors die naturally or are preempted, it shouldn't try to replace
them because there's no need. If we simply force numExecutorsPending to be 0
in that situation, targetNumExecutors would be calculated as 10, which is
incorrect.
The problem is that some sort of double counting is going on. I think
executors that we don't want can be counted both in the form of negative
numPendingExecutors and as part of executorsPendingRemoval.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]