Jun Gong created YARN-3094:
------------------------------
Summary: reset timer for liveness monitors after RM recovery
Key: YARN-3094
URL: https://issues.apache.org/jira/browse/YARN-3094
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
When RM restarts, it will recover RMAppAttempts and registry them to
AMLivenessMonitor if they are not in final state. AM will time out in RM if the
recover process takes long time due to some reasons(e.g. too many apps).
In our system, we found the recover process took about 3 mins, and all AM time
out.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)