[
https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15320001#comment-15320001
]
Naganarasimha G R commented on YARN-4464:
-----------------------------------------
Thanks for the comments [~vinodkv],
bq. I remember that Jian He did some benchmarking to demonstrate that recovery
of 10K apps takes only 10 seconds. We need to understand the root-cause here.
You are right though initially there were some discussions on the propable
cause for the delay later on it just went on modifying the default value.
Initially i thought it might be because of YARN-3104 (as mentioned by [~kasha])
or YARN-4041, but not quite sure about it.
But having said that, i was thinking more in the lines whether its required to
store so many finished apps when we are already supporting ATS. Apart from
adding to the startup time (though nominal but unnecessary when we have many
running apps in large cluster) it was also adding lot of unnecessary logs and
publish of ATS events. Hence was more inclined to reducing the default value.
> default value of yarn.resourcemanager.state-store.max-completed-applications
> should lower.
> ------------------------------------------------------------------------------------------
>
> Key: YARN-4464
> URL: https://issues.apache.org/jira/browse/YARN-4464
> Project: Hadoop YARN
> Issue Type: Wish
> Components: resourcemanager
> Reporter: KWON BYUNGCHANG
> Assignee: Daniel Templeton
> Priority: Blocker
> Attachments: YARN-4464.001.patch, YARN-4464.002.patch,
> YARN-4464.003.patch, YARN-4464.004.patch
>
>
> my cluster has 120 nodes.
> I configured RM Restart feature.
> {code}
> yarn.resourcemanager.recovery.enabled=true
> yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
> yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore
> {code}
> unfortunately I did not configure
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> so that property configured default value 10,000.
> I have restarted RM due to changing another configuartion.
> I expected that RM restart immediately.
> recovery process was very slow. I have waited about 20min.
> realize missing
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> its default value is very huge.
> need to change lower value or document notice on [RM Restart
> page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]