Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19046 This has been a long time request from our YARN team; they've had users run into issues with YARN and traced it back to Spark making a large number of container requests. You can argue that it's an issue in YARN and you'd be correct, but it doesn't mean Spark cannot try to help. The MR AM does [something similar](https://github.com/apache/hadoop/blob/f67237cbe7bc48a1b9088e990800b37529f1db2a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java). > looking at adding a config to limit # of tasks in parallel. That could help because it would limit the number of containers Spark would request indirectly, but it hardly feels like a substitute for this, since I expect very few people to use that. > I definitely want a config for it and prefer off by default Why? The only downside you mention is that if the cluster is completely full, then another app may be making requests and may get a container sooner than YARN. That sounds like an edge case where you really should be isolating those apps into separate queues if such resource contention is a problem. One thing that would probably be ok is to use `spark.dynamicAllocation.maxExecutors` as the upper limity if it's explicitly set, instead of this. But the default for that config is `Int.MaxValue`, so by default I think it makes sense to have a way to automatically limit these requests without users having to make guesses about what values they have to set. > This change isn't going to help anything if your cluster is actually large enough to get say 50000 containers. If all of those containers are actually allocated to apps, or your application's queue actually limit the number of containers your app can get to a much smaller number, then yes it can help. > I'm also not sure why we just don't ask for all the containers up front. Wouldn't that mean that YARN would actually allocate resources you don't need? That sounds to me like a middle between dynamic allocation and fixed allocation, where you allocate the containers but don't actually start executors, and I'm not sure what that would help with.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org