zhengchenyu created YARN-10557:
----------------------------------
Summary: Application may be leaked in state store when
resourcemanager failover.
Key: YARN-10557
URL: https://issues.apache.org/jira/browse/YARN-10557
Project: Hadoop YARN
Issue Type: Bug
Reporter: zhengchenyu
In resourceManager log, I found amount of log like below:
{code}
2020-12-30 19:18:48,120 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of
completed apps kept in state store met: maxCompletedAppsInStateStore = 2000,
but not removing app application_1608912003714_0098 from state store as log
aggregation have not finished yet.
{code}
When I search this, I found the application has already log aggerated. When I
debug this, I found the app's logAggregationStatusForAppReport is NOT_START.
(Note: In my test cluster, I simulate restart rm occasionally)
If the application is finished and log aggerated, but not removed from rm. When
rm failover, the new rm will recover from state store, but
logAggregationStatusForAppReport will not be updated. So
logAggregationStatusForAppReport keep NOT_START. Then the app will not be
removed from statestore.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]