Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5018#issuecomment-81999174
I have two main concerns about this patch.
The first is that I think the logic in `CoarseGrainedSchedulerBackend` and
`ExecutorAllocationManager` is overcomplicated. I think we might be able to
remove `executorsPendingRemoval.size` from the `targetNumExecutors` in both of
these places. I also think it's confusing that `pendingNumExecutors` can get
out of sync (it can be negative in `ExecutorAllocationManager`, but not in
`CoarseGrainedSchedulerBackend`). However, this patch isn't really the source
of additional complexity.
My second is that we now make two RPCs to the AM when we want to kill a
container. Ideally we would have two paths: one that maintains the existing
behavior of `killExecutors` for users and includes the `requestTotalExecutors`
RPC. And another meant for consumption by `ExecutorAllocationManager` that
calls `killExecutor` directly.
That said, I've spent a while today trying to think of situations where
this patch would lead to incorrect behavior and have become convinced that it
should work. As the patch fixes a critical bug and doesn't introduce
additional complexity, I'd be OK with merging it and fixing the other issues
later.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]