Akshesh Doshi created YARN-11512:
------------------------------------
Summary: Graceful decommission doesn't work when NM restart
recovery is enabled
Key: YARN-11512
URL: https://issues.apache.org/jira/browse/YARN-11512
Project: Hadoop YARN
Issue Type: Bug
Components: graceful, nodemanager
Affects Versions: 3.3.1
Reporter: Akshesh Doshi
We have added these configs on yarn-site.xml file of our Hadoop-Yarn cluster.
{code:xml}
<property>
<name>yarn.nodemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.recovery.supervised</name>
<value>true</value>
</property>
{code}
The NM restart recovery feature has been working well, applications not failing
even if we restart nodemanager processes. But, when we try to decommission a
node by adding the node name to yarn_exclude_hosts file and refreshing nodes on
resourcemanager, the applications that had containers running on that node are
stuck for a long time and then fail.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]