[
https://issues.apache.org/jira/browse/YARN-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291263#comment-17291263
]
Haibo Chen commented on YARN-10651:
-----------------------------------
I updated the patch to add some logging.
Unit test wise, the key condition to trigger this is that the scheduler thread
must process a healthy node update event after the corresponding node turned
into the DECOMMISSIONING state (see the diagram for event ordering), which only
happens in a very busy cluster.
There isn't anything we can use right now in unit test to artificially slow
down the scheduler thread, wait for the node to be DECOMMISSIONING and then
allow it to process node update.
> CapacityScheduler crashed with NPE in
> AbstractYarnScheduler.updateNodeResource()
> ---------------------------------------------------------------------------------
>
> Key: YARN-10651
> URL: https://issues.apache.org/jira/browse/YARN-10651
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.10.0, 2.10.1
> Reporter: Haibo Chen
> Assignee: Haibo Chen
> Priority: Major
> Attachments: YARN-10651.00.patch, YARN-10651.01.patch, event_seq.jpg
>
>
> {code:java}
> 2021-02-24 17:07:39,798 FATAL org.apache.hadoop.yarn.event.EventDispatcher:
> Error in handling event type NODE_RESOURCE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:809)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeAndQueueResource(CapacityScheduler.java:1116)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1505)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
> at
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> 2021-02-24 17:07:39,798 INFO org.apache.hadoop.yarn.event.EventDispatcher:
> Exiting, bbye..{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]