[jira] [Resolved] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and NN go down at the same time.

2016-04-12 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved YARN-3639.

Resolution: Duplicate

> It takes too long time for RM to recover all apps if the original active RM 
> and NN go down at the same time.
> 
>
> Key: YARN-3639
> URL: https://issues.apache.org/jira/browse/YARN-3639
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Xianyin Xin
> Attachments: YARN-3639-recovery_log_1_app.txt
>
>
> If the active RM and NN go down at the same time, the new RM will take long 
> time to recover all apps. After analysis, we found the root cause is renewing 
> HDFS tokens in the recovering process. The HDFS client created by the renewer 
> would firstly try to connect to the original NN, the result of which is 
> time-out after 10~20s, and then the client tries to connect to the new NN. 
> The entire recovery cost 15*#apps seconds according our test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and NN go down at the same time.

2016-04-12 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved YARN-3639.

Resolution: Fixed

> It takes too long time for RM to recover all apps if the original active RM 
> and NN go down at the same time.
> 
>
> Key: YARN-3639
> URL: https://issues.apache.org/jira/browse/YARN-3639
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Xianyin Xin
> Attachments: YARN-3639-recovery_log_1_app.txt
>
>
> If the active RM and NN go down at the same time, the new RM will take long 
> time to recover all apps. After analysis, we found the root cause is renewing 
> HDFS tokens in the recovering process. The HDFS client created by the renewer 
> would firstly try to connect to the original NN, the result of which is 
> time-out after 10~20s, and then the client tries to connect to the new NN. 
> The entire recovery cost 15*#apps seconds according our test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3639) It takes too long time for RM to recover all apps if the original active RM and NN go down at the same time.

2015-10-26 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin resolved YARN-3639.
---
Resolution: Fixed

This has been resolved by YARN-4041, so close it.

> It takes too long time for RM to recover all apps if the original active RM 
> and NN go down at the same time.
> 
>
> Key: YARN-3639
> URL: https://issues.apache.org/jira/browse/YARN-3639
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Xianyin Xin
> Attachments: YARN-3639-recovery_log_1_app.txt
>
>
> If the active RM and NN go down at the same time, the new RM will take long 
> time to recover all apps. After analysis, we found the root cause is renewing 
> HDFS tokens in the recovering process. The HDFS client created by the renewer 
> would firstly try to connect to the original NN, the result of which is 
> time-out after 10~20s, and then the client tries to connect to the new NN. 
> The entire recovery cost 15*#apps seconds according our test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)