[ 
https://issues.apache.org/jira/browse/YARN-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarun Parimi reassigned YARN-10890:
-----------------------------------

    Assignee: Tarun Parimi

> Node Attributes in Distributed mapping misses update to scheduler when node 
> gets decommissioned/recommissioned
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10890
>                 URL: https://issues.apache.org/jira/browse/YARN-10890
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.3.0, 3.2.1
>            Reporter: Tarun Parimi
>            Assignee: Tarun Parimi
>            Priority: Major
>
> The NodeAttributesManagerImpl maintains the node to attribute mapping. But it 
> doesnt remove the mapping when a node goes down. This makes sense for 
> centralized mapping, since the attribute mapping is centralized to RM, so a 
> node going down doesn't affect the mapping.
> In distributed mapping, the node attribute mapping is updated via NM 
> heartbeat to RM and so these node attributes are only valid as long as the 
> node is heartbeating . But when a node is decommissioned or lost, the node 
> attribute entry still remains in  NodeAttributesManagerImpl.
> After the performance improvement change done in YARN-8925, we only update 
> distributed node attributes when necessary. However when a previously 
> decommissioned node is recommissioned again, NodeAttributesManagerImpl still 
> has the old mapping entry belonging to the old SchedulerNode instance which 
> was decommisioned.
> This results in ResourceTrackerService#updateNodeAttributesIfNecessary 
> skipping the update, since it is comparing with the attributes belonging to 
> the old decommisioned node instance.
> {code:java}
>           if (!NodeLabelUtil
>               .isNodeAttributesEquals(nodeAttributes, currentNodeAttributes)) 
> {
>             this.rmContext.getNodeAttributesManager()
>                 .replaceNodeAttributes(NodeAttribute.PREFIX_DISTRIBUTED,
>                     ImmutableMap.of(nodeId.getHost(), nodeAttributes));
>           } else if (LOG.isDebugEnabled()) {
>             LOG.debug("Skip updating node attributes since there is no change 
> for "
>                 + nodeId + " : " + nodeAttributes);
>           }
> {code}
> We should remove the distributed node attributes whenever a node gets 
> deactivated to avoid this issue. So these attributes will get added properly 
> in scheduler whenever the node becomes active again and registers/heartbeats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to