[
https://issues.apache.org/jira/browse/YARN-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prabhu Joseph updated YARN-11417:
---------------------------------
Description:
RM Crashes when changing Node Label of a Node in Distributed Configuration.
{code:java}
2023-01-11 16:25:50,986 ERROR org.apache.hadoop.yarn.event.EventDispatcher
(SchedulerEventDispatcher:Event Processor): Error in handling event type
NODE_REMOVED to the Event Dispatcher
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.removeNode(ClusterNodeTracker.java:194)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.removeNode(CapacityScheduler.java:2145)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1833)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
at
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:83)
at java.lang.Thread.run(Thread.java:750)
{code}
*Repro*
1. Two NodeManagers with CORE Node Label
{code:java}
yarn.nodemanager.node-labels.provider.configured-node-partition=CORE
yarn.node-labels.enabled = true
yarn.node-labels.configuration-type = distributed
yarn.nodemanager.node-labels.provider = config
{code}
2. Remove the Node Label from one of the node to make it Default Partition and
restart nodemanager.
was:
RM Crashes when changing Node Label of a Node in Distributed Configuration.
{code}
2023-01-11 16:25:50,986 ERROR org.apache.hadoop.yarn.event.EventDispatcher
(SchedulerEventDispatcher:Event Processor): Error in handling event type
NODE_REMOVED to the Event Dispatcher
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.removeNode(ClusterNodeTracker.java:194)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.removeNode(CapacityScheduler.java:2145)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1833)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
at
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:83)
at java.lang.Thread.run(Thread.java:750)
{code}
*Repro*
1. Two NodeManagers with CORE Node Label
{code}
yarn.nodemanager.node-labels.provider.configured-node-partition=CORE
yarn.node-labels.enabled = true
yarn.node-labels.configuration-type = distributed
yarn.nodemanager.node-labels.provider = config
{code}
2. Change the Node Label of one of the node into TASK and restart nodemanager.
> RM Crashes when changing Node Label of a Node in Distributed Configuration
> --------------------------------------------------------------------------
>
> Key: YARN-11417
> URL: https://issues.apache.org/jira/browse/YARN-11417
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.3.3
> Reporter: Prabhu Joseph
> Assignee: Prabhu Joseph
> Priority: Minor
>
> RM Crashes when changing Node Label of a Node in Distributed Configuration.
> {code:java}
> 2023-01-11 16:25:50,986 ERROR org.apache.hadoop.yarn.event.EventDispatcher
> (SchedulerEventDispatcher:Event Processor): Error in handling event type
> NODE_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.removeNode(ClusterNodeTracker.java:194)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.removeNode(CapacityScheduler.java:2145)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1833)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:83)
> at java.lang.Thread.run(Thread.java:750)
> {code}
> *Repro*
> 1. Two NodeManagers with CORE Node Label
> {code:java}
> yarn.nodemanager.node-labels.provider.configured-node-partition=CORE
> yarn.node-labels.enabled = true
> yarn.node-labels.configuration-type = distributed
> yarn.nodemanager.node-labels.provider = config
> {code}
> 2. Remove the Node Label from one of the node to make it Default Partition
> and restart nodemanager.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]