Abhishek Dixit created YARN-11421:
-------------------------------------
Summary: Graceful Decommission ignores launched containers and
gets deactivated before timeout
Key: YARN-11421
URL: https://issues.apache.org/jira/browse/YARN-11421
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 3.3.4
Reporter: Abhishek Dixit
During Graceful Decommission, a Node gets deactivated before timeout even
though there are launched containers on that node.
We have observed cases when graceful decommission signal is sent to node and
Containers are launched at NodeManager and at the same time, in such cases
ResourceManager moves the node from Decommissioning to Decommissioned state
because launced containers are not checked in DeactivateNodeTransition.
We will suggest using a MultiArc transition instead of DeactivateNodeTransition
which checks for AM containers from the scheduler and then decides whether to
keep the node in Decommissioning state or move it to Decommissioned State.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]