Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5704#issuecomment-97237212
I sync'd up with @andrewor14 offline about this. To summarize my
understanding of his concern, the current patch has non-optimal behavior in the
following (common) situation:
1) We run a job and ramp up to 100 executors.
2) Then there's a quiet period, but our executors stick around.
3) We then run another job, which could saturate 200 executors.
Instead of immediately increasing our target number of executors in
response to the load, we end up needing to wait the same amount of time that it
took to ramp up to 100 before requesting any additional executors.
We settled on the following as a solution:
In `addExecutors`, before increasing `numExecutorsTarget` with
`numExecutorsToAdd`, if `numExecutorsTarget` is less than `executorIds.size`,
set `numExecutorsTarget` to `executorIds.size`.
This basically means that we never spend any time "ramping up" to where we
already are.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]