Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/2746#issuecomment-59592148
Hi all. I have discussed the design offline with @kayousterhout and
@pwendell and we have come to the following high level consensus:
- We should treat add as a best-effort thing. This means there is no need
to retry it, and we shouldn't wait for the new ones to register before asking
for more. The latter point here means an exponential increase policy can become
an add-to-max policy if we set the add interval to a small value.
- The approach of determining the number of executors to add based on the
number of pending tasks will be under consideration in the future, but will not
be a part of this release. This is mainly because this add policy is more
opaque to the user and the number added may be unpredictable depending on when
the timer is triggered.
- In a future release, we will make the scaling policies pluggable. Until
then, for this release, we will expose a `@developerApi` `sc.addExecutors` and
`sc.removeExecutors` in case the application wants to use this feature on their
own (they won't have to enable `spark.dynamicAllocation.enabled` to use these).
- We should assume that removes will always succeed for simplicity. This
means there is no need to retry them.
- To simplify the timer logic, we will make the variables hold the
expiration time of the timer instead of a counter that is reset to 0 every time
the timer triggers. This makes the semantics of the timer more easily
understandable.
- Use the listener API to identify when tasks are built up for testability.
I should emphasize that this design is only for the first-cut
implementation of this feature. We will make an effort to generalize this and
expose the ability for the user to implement his/her own heuristics for 1.3
(tentative). Lastly, I will implement all of these shortly, and the new code
will likely be quite different. Please kindly hold back your reviews until then.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]