Jun Gong created YARN-3094:
------------------------------

             Summary: reset timer for liveness monitors after RM recovery
                 Key: YARN-3094
                 URL: https://issues.apache.org/jira/browse/YARN-3094
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 2.6.0
            Reporter: Jun Gong
            Assignee: Jun Gong


When RM restarts, it will recover RMAppAttempts and registry them to 
AMLivenessMonitor if they are not in final state. AM will time out in RM if the 
recover process takes long time due to some reasons(e.g. too many apps). 

In our system, we found the recover process took about 3 mins, and all AM time 
out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to