[ https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xianyin Xin updated YARN-3639: ------------------------------ Assignee: (was: Xianyin Xin) > It takes too long time for RM to recover all apps if the original active RM > and namenode is deployed on the same node. > ---------------------------------------------------------------------------------------------------------------------- > > Key: YARN-3639 > URL: https://issues.apache.org/jira/browse/YARN-3639 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Xianyin Xin > > If the node on which the active RM runs dies and if the active namenode is > running on the same node, the new RM will take long time to recover all apps. > After analysis, we found the root cause is renewing HDFS tokens in the > recovering process. The HDFS client created by the renewer would firstly try > to connect to the original namenode, the result of which is time-out after > 10~20s, and then the client tries to connect to the new namenode. The entire > recovery cost 15*#apps seconds according our test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)