Tarun Parimi created YARN-10890:
-----------------------------------
Summary: Node Attributes in Distributed mapping misses update to
scheduler when node gets decommissioned/recommissioned
Key: YARN-10890
URL: https://issues.apache.org/jira/browse/YARN-10890
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.2.1, 3.3.0
Reporter: Tarun Parimi
The NodeAttributesManagerImpl maintains the node to attribute mapping. But it
doesnt remove the mapping when a node goes down. This makes sense for
centralized mapping, since the attribute mapping is centralized to RM, so a
node going down doesn't affect the mapping.
In distributed mapping, the node attribute mapping is updated via NM heartbeat
to RM and so these node attributes are only valid as long as the node is
heartbeating . But when a node is decommissioned or lost, the node attribute
entry still remains in NodeAttributesManagerImpl.
After the performance improvement change done in YARN-8925, we only update
distributed node attributes when necessary. However when a previously
decommissioned node is recommissioned again, NodeAttributesManagerImpl still
has the old mapping entry belonging to the old SchedulerNode instance which was
decommisioned.
This results in ResourceTrackerService#updateNodeAttributesIfNecessary skipping
the update, since it is comparing with the attributes belonging to the old
decommisioned node instance.
{code:java}
if (!NodeLabelUtil
.isNodeAttributesEquals(nodeAttributes, currentNodeAttributes))
{
this.rmContext.getNodeAttributesManager()
.replaceNodeAttributes(NodeAttribute.PREFIX_DISTRIBUTED,
ImmutableMap.of(nodeId.getHost(), nodeAttributes));
} else if (LOG.isDebugEnabled()) {
LOG.debug("Skip updating node attributes since there is no change
for "
+ nodeId + " : " + nodeAttributes);
}
{code}
We should remove the distributed node attributes whenever a node gets
deactivated to avoid this issue. So these attributes will get added properly in
scheduler whenever the node becomes active again and registers/heartbeats.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]