[ 
https://issues.apache.org/jira/browse/YARN-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680718#comment-17680718
 ] 

Prabhu Joseph edited comment on YARN-11417 at 1/25/23 5:36 PM:
---------------------------------------------------------------

When the NodeManager node label is changed to a new label and restarted, it 
resyncs to the ResourceManager with the new label. CapacityScheduler receives 
the NODE_LABELS_UPDATE event, which removes the node from the nodesList of the 
old node partition in the {{nodesPerLabel}} map <{{partition}}, {{nodesList}}> 
part of {{ClusterNodeTracker}}#{{updateNodesPerPartition}}. Then 
{{CapacityScheduler}} receives NODE_REMOVED which removes the node from the 
{{ClusterNodeTracker}} and also removes the node from the nodesList of the new 
partition in {{nodesPerLabel}}, which will fail with NPE as the new partition 
is not yet present in the {{nodesPerLabel}} map and will be added only after 
the NODE_ADDED event. 

In the absence of a new partition, {{ClusterNodeTracker}}#{{removeNode}} can 
skip removing the node from the {{nodesPerLabel}} as anyway that is already 
removed during NODE_LABELS_UPDATE.



was (Author: prabhu joseph):
When the NodeManager node label is changed to a new label and restarted, it 
resyncs to the ResourceManager with the new label. CapacityScheduler receives 
the NODE_LABELS_UPDATE event, which removes the node from the nodesList of the 
old node partition in the nodesPerLabel map <partition, nodesList> part of 
ClusterNodeTracker#updateNodesPerPartition. Then CapacityScheduler receives 
NODE_REMOVED which removes the node from the ClusterNodeTracker and also 
removes the node from the nodesList of the new partition in nodesPerLabel, 
which will fail with NPE as the new partition is not yet present in the 
nodesPerLabel map and will be added only after the NODE_ADDED event. 

In the absence of a new partition, ClusterNodeTracker#removeNode can skip 
removing the node from the nodesPerLabel as anyway that is already removed 
during NODE_LABELS_UPDATE.


> RM Crashes when changing Node Label of a Node in Distributed Configuration
> --------------------------------------------------------------------------
>
>                 Key: YARN-11417
>                 URL: https://issues.apache.org/jira/browse/YARN-11417
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.3.3
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Minor
>
> RM Crashes when changing Node Label of a Node in Distributed Configuration.
> {code:java}
> 2023-01-11 16:25:50,986 ERROR org.apache.hadoop.yarn.event.EventDispatcher 
> (SchedulerEventDispatcher:Event Processor): Error in handling event type 
> NODE_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.removeNode(ClusterNodeTracker.java:194)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.removeNode(CapacityScheduler.java:2145)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1833)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
>         at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:83)
>         at java.lang.Thread.run(Thread.java:750)
> {code}
> *Repro*
> 1. Two NodeManagers with CORE Node Label
> {code:java}
> yarn.nodemanager.node-labels.provider.configured-node-partition=CORE
> yarn.node-labels.enabled = true
> yarn.node-labels.configuration-type = distributed
> yarn.nodemanager.node-labels.provider = config
> {code}
> 2. Remove the Node Label from one of the node to make it Default Partition 
> and restart nodemanager.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to