Xianyin Xin commented on YARN-3639:

Yes, you're right [~aw]. 

> It takes too long time for RM to recover all apps if the original active RM 
> and namenode is deployed on the same node.
> ----------------------------------------------------------------------------------------------------------------------
>                 Key: YARN-3639
>                 URL: https://issues.apache.org/jira/browse/YARN-3639
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Xianyin Xin
>         Attachments: YARN-3639-recovery_log_1_app.txt
> If the node on which the active RM runs dies and if the active namenode is 
> running on the same node, the new RM will take long time to recover all apps. 
> After analysis, we found the root cause is renewing HDFS tokens in the 
> recovering process. The HDFS client created by the renewer would firstly try 
> to connect to the original namenode, the result of which is time-out after 
> 10~20s, and then the client tries to connect to the new namenode. The entire 
> recovery cost 15*#apps seconds according our test.

This message was sent by Atlassian JIRA

Reply via email to