Zizon created YARN-9934:
---------------------------
Summary: LogAggregationService should not submit aggregator when
app dir creation fail
Key: YARN-9934
URL: https://issues.apache.org/jira/browse/YARN-9934
Project: Hadoop YARN
Issue Type: Improvement
Components: log-aggregation
Reporter: Zizon
Attachments: YARN-8246.patch
Before submiting a log aggreation runnable, LogAggregationService will try to
create the aggreated log dir.
In some case, it may fail(e.g dir num exceed max limit)
When it did failed and submitted to LogAggregationService, the runnable may run
forever if some app statue flip misbehavior(e.g not handling application
complete event rightfully,thus keeping appFinishing of AppLogAggregatorImpl be
always true).
In our production(Version 2.7.3), this cause huge number of dangling
aggregator(~400+ LogAggregationService threads alive for some node, in which
nodemanager configured only 50+ vCPUs).
The patch try to early throw the creation exception, avoiding starting
unnecessary log polling.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]