Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/5018#issuecomment-81999174
  
    I have two main concerns about this patch.
    
    The first is that I think the logic in `CoarseGrainedSchedulerBackend` and 
`ExecutorAllocationManager` is overcomplicated.  I think we might be able to 
remove `executorsPendingRemoval.size` from the `targetNumExecutors` in both of 
these places.  I also think it's confusing that `pendingNumExecutors` can get 
out of sync (it can be negative in `ExecutorAllocationManager`, but not in 
`CoarseGrainedSchedulerBackend`).  However, this patch isn't really the source 
of additional complexity. 
    
    My second is that we now make two RPCs to the AM when we want to kill a 
container.  Ideally we would have two paths: one that maintains the existing 
behavior of `killExecutors` for users and includes the `requestTotalExecutors` 
RPC.  And another meant for consumption by `ExecutorAllocationManager` that 
calls `killExecutor` directly.
    
    That said, I've spent a while today trying to think of situations where 
this patch would lead to incorrect behavior and have become convinced that it 
should work.  As the patch fixes a critical bug and doesn't introduce 
additional complexity, I'd be OK with merging it and fixing the other issues 
later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to