[
https://issues.apache.org/jira/browse/YARN-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zizon updated YARN-9934:
------------------------
Attachment: YARN-9934.patch.1
> LogAggregationService should not submit aggregator when app dir creation fail
> -----------------------------------------------------------------------------
>
> Key: YARN-9934
> URL: https://issues.apache.org/jira/browse/YARN-9934
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: log-aggregation
> Reporter: Zizon
> Priority: Minor
> Attachments: YARN-9934.patch, YARN-9934.patch.1
>
>
> Before submiting a log aggreation runnable, LogAggregationService will try
> to create the aggreated log dir.
> In some case, it may fail(e.g dir num exceed max limit)
>
> When it did failed and submitted to LogAggregationService, the runnable may
> run forever if some app statue flip misbehavior(e.g not handling application
> complete event rightfully,thus keeping appFinishing of AppLogAggregatorImpl
> be always true).
>
> In our production(Version 2.7.3), this cause huge number of dangling
> aggregator(~400+ LogAggregationService threads alive for some node, in which
> nodemanager configured only 50+ vCPUs).
>
> The patch try to early throw the creation exception, avoiding starting
> unnecessary log polling.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]