[ 
https://issues.apache.org/jira/browse/YARN-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-8493:
--------------------------------
    Target Version/s:   (was: 2.6.0)

> LogAggregation in NodeManager is put off because great amount of long running 
> app
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-8493
>                 URL: https://issues.apache.org/jira/browse/YARN-8493
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0
>            Reporter: JayceAu
>            Priority: Major
>         Attachments: YARN-8493.001.patch
>
>
> h2. Issue summary
> In our Yarn cluster, on average, it will take 30 min to show the app log on 
> web after the app is finished. This problem is caused by the limitation of 
> threadPool size in NodeManager.
> In NodeManager, it will set aside an appLogAggregator to do log Aggregation 
> for each container running on this NodeManager. This appLogAggregator will 
> occupy one thread in the threadPool until it's finished in the whole cluster. 
>  NodeManager uses FixedThreadPool (default size is 100) instead of 
> CachedThreadPool which is used in the old version. At peak moment in our 
> production environment, there is more than 350 AppLogAggregator running or 
> queuing in thread pool and those app queuing will suffer from great log 
> aggregation latency.
> h2. Possible Solution
> We can increase yarn.nodemanager.logaggregation.threadpool-size-max to a 
> higher value to solve it. But this problem will happen again if the running 
> app increase and it will create a lot of idle thread waiting for log 
> aggregation. 
> Our solution is not to put the {color:#333333}appLogAggregator {color}into 
> the threadPool until it's finished:
>  # give an callback to each {color:#333333}appLogAggregator to put itself 
> into the threadPool, it's not called until it's notified{color}
>  # if rollingMonitorInterval is greater than 0, NodeManager will set aside a 
> thread in LogAggregationService to do log Aggregation for all the running app 
> periodically
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to