Github user ksakellis commented on the pull request:
https://github.com/apache/spark/pull/4051#issuecomment-71558537
LGTM
I agree with @sryza that maxExecutors should default to Integer.MAX_VALUE.
It is not the app's responsibility to play fair but rather to ask the resource
manager, YARN, for its optimal amount of resources and it take care of
enforcing the admin policies. @vanzin I also don't see the point with the app
setting the max to #nm. If you default maxExecutors to #nm on startup, what
happens when the administrator adds more node managers while the spark job is
running? It would be good for Spark to take advantage of the spare capacity.
Although I don't like yet another configuration option, initialExecutors, I
don't think there is a way around it given the hive on spark use case. At least
none of the three are required which is the most important part. A default
value of 0 for minExecutors seems like a good default for the reasons described
by @vanzin. Defaulting initialExecutors to minExecutors seems fine also.
@andrewor14 I don't like the idea of using --num-executors as the
initialExecutors. Like @vanzin we want at some point to make dynamic allocation
the default so having consistent configuration make sense. Reusing
--num-executors would just add confusion now and later.
@andrewor14 why do we increase the number of executors we request once a
minute? is this to reduce load on the YARN resource manager? I think we might
want to revisit how we ramp up the cluster but that is beyond the scope of this
PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]