Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19046
This has been a long time request from our YARN team; they've had users run
into issues with YARN and traced it back to Spark making a large number of
container requests. You can argue that it's an issue in YARN and you'd be
correct, but it doesn't mean Spark cannot try to help.
The MR AM does [something
similar](https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java).
> looking at adding a config to limit # of tasks in parallel.
That could help because it would limit the number of containers Spark would
request indirectly, but it hardly feels like a substitute for this, since I
expect very few people to use that.
> I definitely want a config for it and prefer off by default
Why? The only downside you mention is that if the cluster is completely
full, then another app may be making requests and may get a container sooner
than YARN. That sounds like an edge case where you really should be isolating
those apps into separate queues if such resource contention is a problem.
One thing that would probably be ok is to use
`spark.dynamicAllocation.maxExecutors` as the upper limity if it's explicitly
set, instead of this. But the default for that config is `Int.MaxValue`, so by
default I think it makes sense to have a way to automatically limit these
requests without users having to make guesses about what values they have to
set.
> This change isn't going to help anything if your cluster is actually
large enough to get say 50000 containers.
If all of those containers are actually allocated to apps, or your
application's queue actually limit the number of containers your app can get to
a much smaller number, then yes it can help.
> I'm also not sure why we just don't ask for all the containers up front.
Wouldn't that mean that YARN would actually allocate resources you don't
need? That sounds to me like a middle between dynamic allocation and fixed
allocation, where you allocate the containers but don't actually start
executors, and I'm not sure what that would help with.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]