Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/663#issuecomment-42346939
See also `Executors.newCachedThreadPool` if the intent is to always create
a new thread if no idle ones are available. That's closer in behavior to always
creating a new `Thread`. You may still get loads of simultaneous threads and
may not save many allocations.
Capping the number of threads helps those two factors (why wouldn't the max
pool size have effect?) but at some level this means this may block when the
executor is busy. Right now the no-arg `LinkedBlockingQueue` will go on
accepting work until it has 2 billion entries, if the executor can't keep up.
This is probably not ideal, and should be capped at something more reasonable.
Not sure if the possibility of blocking the thread that invokes the task is a
problem or not.
How about setting the max pool size, and queue capacity, to a value that is
deemed quite large? like, you never want more than 1000 threads and 100K tasks
queued? at least you're putting some sane defenses up against these situations
rather than make 100K threads.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---