JayceAu created YARN-8493:
-----------------------------
Summary: LogAggregation in NodeManager is put off because great
amount of long running app
Key: YARN-8493
URL: https://issues.apache.org/jira/browse/YARN-8493
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.6.0
Reporter: JayceAu
Fix For: 2.6.0
h2. Issue summary
In our Yarn cluster, on average, it will take 30 min to show the app log on web
after the app is finished. This problem is caused by the limitation of
threadPool size in NodeManager.
In NodeManager, it will set aside an appLogAggregator to do log Aggregation for
each container running on this NodeManager. This appLogAggregator will occupy
one thread in the threadPool until it's finished in the whole cluster.
NodeManager uses FixedThreadPool (default size is 100) instead of
CachedThreadPool which is used in the old version. At peak moment in our
production environment, there is more than 350 AppLogAggregator running or
queuing in thread pool and those app queuing will suffer from great log
aggregation latency.
h2. Possible Solution
We can increase yarn.nodemanager.logaggregation.threadpool-size-max to a higher
value to solve it. But this problem will happen again if the running app
increase and it will create a lot of idle thread waiting for log aggregation.
Our solution is not to put the {color:#333333}appLogAggregator {color}into the
threadPool until it's finished:
# give an callback to each {color:#333333}appLogAggregator to put itself into
the threadPool, it's not called until it's notified{color}
# if rollingMonitorInterval is greater than 0, NodeManager will set aside a
thread in LogAggregationService to do log Aggregation for all the running app
periodically
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]