[
https://issues.apache.org/jira/browse/YARN-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
JayceAu updated YARN-8493:
--------------------------
Attachment: YARN-8493.001.patch
> LogAggregation in NodeManager is put off because great amount of long running
> app
> ---------------------------------------------------------------------------------
>
> Key: YARN-8493
> URL: https://issues.apache.org/jira/browse/YARN-8493
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 2.6.0
> Reporter: JayceAu
> Priority: Major
> Fix For: 2.6.0
>
> Attachments: YARN-8493.001.patch
>
>
> h2. Issue summary
> In our Yarn cluster, on average, it will take 30 min to show the app log on
> web after the app is finished. This problem is caused by the limitation of
> threadPool size in NodeManager.
> In NodeManager, it will set aside an appLogAggregator to do log Aggregation
> for each container running on this NodeManager. This appLogAggregator will
> occupy one thread in the threadPool until it's finished in the whole cluster.
> NodeManager uses FixedThreadPool (default size is 100) instead of
> CachedThreadPool which is used in the old version. At peak moment in our
> production environment, there is more than 350 AppLogAggregator running or
> queuing in thread pool and those app queuing will suffer from great log
> aggregation latency.
> h2. Possible Solution
> We can increase yarn.nodemanager.logaggregation.threadpool-size-max to a
> higher value to solve it. But this problem will happen again if the running
> app increase and it will create a lot of idle thread waiting for log
> aggregation.
> Our solution is not to put the {color:#333333}appLogAggregator {color}into
> the threadPool until it's finished:
> # give an callback to each {color:#333333}appLogAggregator to put itself
> into the threadPool, it's not called until it's notified{color}
> # if rollingMonitorInterval is greater than 0, NodeManager will set aside a
> thread in LogAggregationService to do log Aggregation for all the running app
> periodically
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]