Zizon created YARN-9934: --------------------------- Summary: LogAggregationService should not submit aggregator when app dir creation fail Key: YARN-9934 URL: https://issues.apache.org/jira/browse/YARN-9934 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation Reporter: Zizon Attachments: YARN-8246.patch
Before submiting a log aggreation runnable, LogAggregationService will try to create the aggreated log dir. In some case, it may fail(e.g dir num exceed max limit) When it did failed and submitted to LogAggregationService, the runnable may run forever if some app statue flip misbehavior(e.g not handling application complete event rightfully,thus keeping appFinishing of AppLogAggregatorImpl be always true). In our production(Version 2.7.3), this cause huge number of dangling aggregator(~400+ LogAggregationService threads alive for some node, in which nodemanager configured only 50+ vCPUs). The patch try to early throw the creation exception, avoiding starting unnecessary log polling. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org