[GitHub] spark pull request: SPARK-1713. Use a thread pool for launching ex...

tgravescs Tue, 06 May 2014 14:40:04 -0700

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/663#issuecomment-42362724
  
    We don't necessarily need to start with 500 threads.   In the very least it 
should be configurable.  In our tests of MR we found that 500 threads didn't 
actually help, we found that 10's worked just fine (20-30), even when launching 
thousands of tasks.  If you actually get 500 threads it can cause memory 
issues.  I guess if we have them timeout quicker it might not be as much of an 
issues (I believe MR's timeout is high), but I would like to see it a config 
just in case it is.  Also 1 second might be a bit to short if you actually want 
to reuse them as the AM heartbeats to the RM every 5 seconds, so if you get one 
round launch them, heartbeat back in, it will have shutdown those threads even 
though it could reuse them.  
    
    @srowen  See the javadoc on the ThreadPoolExecutor about 
LInkedBlockingQueue:
    
    Unbounded queues. Using an unbounded queue (for example a 
LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait 
in the queue when all corePoolSize threads are busy. Thus, no more than 
corePoolSize threads will ever be created. (And the value of the 
maximumPoolSize therefore doesn't have any effect.) This may be appropriate 
when each task is completely independent of others, so tasks cannot affect each 
others execution; for example, in a web page server. While this style of 
queuing can be useful in smoothing out transient bursts of requests, it admits 
the possibility of unbounded work queue growth when commands continue to arrive 
on average faster than they can be processed.
    
    Also perhaps adding a comment there to explain it would be good as its easy 
to miss that max isn't used.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1713. Use a thread pool for launching ex...

Reply via email to