[ 
https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319802#comment-15319802
 ] 

Vinod Kumar Vavilapalli commented on YARN-4464:
-----------------------------------------------

Tx for the ping [~templedf].

I haven't paid attention to this before. Apologies for pitching in very late.

bq. recovery process was very slow. I have waited about 20min. 
Did we ever find out why this takes 20mins? As part of original recovery 
feature, I remember that [~jianhe] did some benchmarking to demonstrate that 
recovery of 10K apps takes only 10 seconds. We need to understand the 
root-cause here.

Irrespective of that, even in trunk, if we prove that recovery takes much 
longer than 10 seconds in some unavoidable cases, the right solution is to make 
the recovery of completed applications alone to be in the background.

> default value of yarn.resourcemanager.state-store.max-completed-applications 
> should lower.
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4464
>                 URL: https://issues.apache.org/jira/browse/YARN-4464
>             Project: Hadoop YARN
>          Issue Type: Wish
>          Components: resourcemanager
>            Reporter: KWON BYUNGCHANG
>            Assignee: Daniel Templeton
>            Priority: Blocker
>         Attachments: YARN-4464.001.patch, YARN-4464.002.patch, 
> YARN-4464.003.patch, YARN-4464.004.patch
>
>
> my cluster has 120 nodes.
> I configured RM Restart feature.
> {code}
> yarn.resourcemanager.recovery.enabled=true
> yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
> yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore
> {code}
> unfortunately I did not configure 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> so that property configured default value 10,000.
> I have restarted RM due to changing another configuartion.
> I expected that RM restart immediately.
> recovery process was very slow.  I have waited about 20min.  
> realize missing 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> its default value is very huge.  
> need to change lower value or document notice on [RM Restart 
> page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to