[
https://issues.apache.org/jira/browse/YARN-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290641#comment-17290641
]
Haibo Chen commented on YARN-10651:
-----------------------------------
When a node update scheduler event is processed by the scheduler thread, the
node might have turned unhealthy and taken to Decommissioning state, in which
case the scheduler would generate a NodeResourceUpdateSchedulerEvent. If there
is already a NodeRemovedSchedulerEvent on the scheduler event loop (because the
node was unhealthy), then the scheduler thread would first process
NodeRemovedSchedulerEvent, removing the schedulerNode and then process
NodeResourceUpdateSchedulerEvent which currently assumes the scheduler is still
there.
The attached diagram shows the sequence of events triggering this.
> CapacityScheduler crashed with NPE in
> AbstractYarnScheduler.updateNodeResource()
> ---------------------------------------------------------------------------------
>
> Key: YARN-10651
> URL: https://issues.apache.org/jira/browse/YARN-10651
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Haibo Chen
> Assignee: Haibo Chen
> Priority: Major
> Attachments: event_seq.jpg
>
>
> {code:java}
> 2021-02-24 17:07:39,798 FATAL org.apache.hadoop.yarn.event.EventDispatcher:
> Error in handling event type NODE_RESOURCE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:809)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeAndQueueResource(CapacityScheduler.java:1116)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1505)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
> at
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> 2021-02-24 17:07:39,798 INFO org.apache.hadoop.yarn.event.EventDispatcher:
> Exiting, bbye..{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]