Akshesh Doshi created YARN-11512:
------------------------------------

             Summary: Graceful decommission doesn't work when NM restart 
recovery is enabled
                 Key: YARN-11512
                 URL: https://issues.apache.org/jira/browse/YARN-11512
             Project: Hadoop YARN
          Issue Type: Bug
          Components: graceful, nodemanager
    Affects Versions: 3.3.1
            Reporter: Akshesh Doshi


We have added these configs on yarn-site.xml file of our Hadoop-Yarn cluster.


{code:xml}
<property>
    <name>yarn.nodemanager.recovery.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.nodemanager.recovery.supervised</name>
    <value>true</value>
</property>
{code}

The NM restart recovery feature has been working well, applications not failing 
even if we restart nodemanager processes. But, when we try to decommission a 
node by adding the node name to yarn_exclude_hosts file and refreshing nodes on 
resourcemanager, the applications that had containers running on that node are 
stuck for a long time and then fail.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to