[
https://issues.apache.org/jira/browse/YARN-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhengchenyu resolved YARN-10557.
--------------------------------
Release Note: YARN-9848
Resolution: Duplicate
I think it duplicate with YARN-9848.
> Application may be leaked in state store when resourcemanager failover.
> -----------------------------------------------------------------------
>
> Key: YARN-10557
> URL: https://issues.apache.org/jira/browse/YARN-10557
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.2.1
> Reporter: zhengchenyu
> Assignee: zhengchenyu
> Priority: Major
> Labels: resourcemanager
> Fix For: 3.3.1
>
>
> In resourceManager log, I found amount of log like below:
> {code}
> 2020-12-30 19:18:48,120 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of
> completed apps kept in state store met: maxCompletedAppsInStateStore = 2000,
> but not removing app application_1608912003714_0098 from state store as log
> aggregation have not finished yet.
> {code}
> When I search this, I found the application has already log aggerated. When I
> debug this, I found the app's logAggregationStatusForAppReport is NOT_START.
> (Note: In my test cluster, I simulate restart rm occasionally)
> If the application is finished and log aggerated, but not removed from rm.
> When rm failover, the new rm will recover from state store (you know log
> aggregation is not stored, so can't remove it), but
> logAggregationStatusForAppReport will not be updated. So
> logAggregationStatusForAppReport keep NOT_START. Then the app will not be
> removed from statestore.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]