[
https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291478#comment-14291478
]
Jun Gong commented on YARN-3094:
--------------------------------
[~chenchun] Thanks for the suggestion. I think the time for service start is
very short, so we could just ignore it. What is more, we need init
AMLivelinessMonitor before ApplicationMasterService because RM recovery process
will use it.
> reset timer for liveness monitors after RM recovery
> ---------------------------------------------------
>
> Key: YARN-3094
> URL: https://issues.apache.org/jira/browse/YARN-3094
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.6.0
> Reporter: Jun Gong
> Assignee: Jun Gong
> Attachments: YARN-3094.patch
>
>
> When RM restarts, it will recover RMAppAttempts and registry them to
> AMLivenessMonitor if they are not in final state. AM will time out in RM if
> the recover process takes long time due to some reasons(e.g. too many apps).
> In our system, we found the recover process took about 3 mins, and all AM
> time out.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)