Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/17854
The reason why `spark.yarn.containerLauncherMaxThreads` does not work here
is because it only control how many threads simultaneously send a container
start command to YARN; that is usually a much quicker operation than the
container itself starting. Which means that if you're starting a bunch of
containers, that setting will only help with sending the requests more quickly
(or more slowly) to YARN, but not in throttling how many containers start at
the same time.
To achieve the second, you could do something like this change, which adds
a new configuration (something I don't like) and also has the issues people
have asked about (ignoring containers that YARN has allocated for you).
Or you could change the way `YarnAllocator` starts containers by having the
threads controlled by `spark.yarn.containerLauncherMaxThreads` actually wait
until the containers are launched (not just until the request is sent to YARN).
This would properly implement throttling based on the existing config
parameter, but is a slightly more complicated change (I think).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]