[
https://issues.apache.org/jira/browse/YARN-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fang Xie updated YARN-4892:
---------------------------
Description:
Enable resourcemanager recovery, set properties as below:
<property>
<description>Enable RM to recover state after starting. If true, then
yarn.resourcemanager.store.class must be specified. </description>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<description> </description>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore</value>
</property>
<property>
<description> </description>
<name>yarn.resourcemanager.fs.state-store.uri</name>
<value>hdfs://apple02:9000/rmstore</value>
</property>
run a distributedshell job, when job running, kill resourcemanager, and then
restart resourcemanager, this job can not be finished and will be hung.
> Job will be hung and can not be finished after resource manager restart and
> enable recovery
> -------------------------------------------------------------------------------------------
>
> Key: YARN-4892
> URL: https://issues.apache.org/jira/browse/YARN-4892
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.7.0
> Reporter: Fang Xie
> Priority: Critical
>
> Enable resourcemanager recovery, set properties as below:
> <property>
> <description>Enable RM to recover state after starting. If true, then
> yarn.resourcemanager.store.class must be specified. </description>
> <name>yarn.resourcemanager.recovery.enabled</name>
> <value>true</value>
> </property>
> <property>
> <description> </description>
> <name>yarn.resourcemanager.store.class</name>
> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore</value>
> </property>
> <property>
> <description> </description>
> <name>yarn.resourcemanager.fs.state-store.uri</name>
> <value>hdfs://apple02:9000/rmstore</value>
> </property>
> run a distributedshell job, when job running, kill resourcemanager, and then
> restart resourcemanager, this job can not be finished and will be hung.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)