[
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157914#comment-15157914
]
Robert Kanter commented on YARN-4697:
-------------------------------------
Looks good overall. A few minor things:
- Add a unit test for NaN and negative values for
{{NM_LOG_AGGREGATION_THREAD_POOL_SIZE}} to verify it uses the default value.
- I thought about the unit test some more, and because the Semaphore has a
count of 1, if the test behaves properly long before the timeout occurs, each
thread will still have to sit through the others' acquiring the semaphore (or
the timeout). It would be faster if we use a ReadWriteLock instead, with the
test acquiring write lock and the threads acquiring read locks. This way, the
threads won't block each other.
- There should be a try-finally with the acquire/release in the test.
> NM aggregation thread pool is not bound by limits
> -------------------------------------------------
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Reporter: Haibo Chen
> Assignee: Haibo Chen
> Attachments: yarn4697.001.patch, yarn4697.002.patch,
> yarn4697.003.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from
> the nodemanager to HDFS if log aggregation is turned on. This is a cached
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause
> a problem on restart. The number of threads created at that point could be
> huge and will put a large load on the NameNode and in worse case could even
> bring it down due to file descriptor issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)