[
https://issues.apache.org/jira/browse/YARN-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18030777#comment-18030777
]
ASF GitHub Bot commented on YARN-11421:
---------------------------------------
github-actions[bot] closed pull request #5905: [YARN-11421] Graceful
Decommission ignores launched containers and gets deactivated before timeout
URL: https://github.com/apache/hadoop/pull/5905
> Graceful Decommission ignores launched containers and gets deactivated before
> timeout
> -------------------------------------------------------------------------------------
>
> Key: YARN-11421
> URL: https://issues.apache.org/jira/browse/YARN-11421
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.2.1, 3.3.1, 3.3.4
> Reporter: Abhishek Dixit
> Priority: Major
> Labels: pull-request-available
>
> During Graceful Decommission, a Node gets deactivated before timeout even
> though there are launched containers on that node.
> We have observed cases when graceful decommission signal is sent to node and
> Containers are launched at NodeManager and at the same time, in such cases
> ResourceManager moves the node from Decommissioning to Decommissioned state
> because launced containers are not checked in DecommissioningNodesWatcher.
> We will suggest waiting for
> yarn.resourcemanager.decommissioning-nodes-watcher.delay-ms to complete
> before marking node ready to be decommissioned. No delay if set to 0. Expire
> interval should not be configured more than RM_AM_EXPIRY_INTERVAL_MS.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]