[
https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xianyin Xin updated YARN-3639:
------------------------------
Assignee: (was: Xianyin Xin)
> It takes too long time for RM to recover all apps if the original active RM
> and namenode is deployed on the same node.
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-3639
> URL: https://issues.apache.org/jira/browse/YARN-3639
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Xianyin Xin
>
> If the node on which the active RM runs dies and if the active namenode is
> running on the same node, the new RM will take long time to recover all apps.
> After analysis, we found the root cause is renewing HDFS tokens in the
> recovering process. The HDFS client created by the renewer would firstly try
> to connect to the original namenode, the result of which is time-out after
> 10~20s, and then the client tries to connect to the new namenode. The entire
> recovery cost 15*#apps seconds according our test.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)