[ 
https://issues.apache.org/jira/browse/YARN-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290641#comment-17290641
 ] 

Haibo Chen commented on YARN-10651:
-----------------------------------

When a node update scheduler event is processed by the scheduler thread, the 
node might have turned unhealthy and taken to Decommissioning state, in which 
case the scheduler would generate a NodeResourceUpdateSchedulerEvent.  If there 
is already a NodeRemovedSchedulerEvent on the scheduler event loop (because the 
node was unhealthy), then the scheduler thread would first process 
NodeRemovedSchedulerEvent, removing the schedulerNode and then process 
NodeResourceUpdateSchedulerEvent which currently assumes the scheduler is still 
there.



The attached diagram shows the sequence of events triggering this.

> CapacityScheduler crashed with NPE in 
> AbstractYarnScheduler.updateNodeResource() 
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-10651
>                 URL: https://issues.apache.org/jira/browse/YARN-10651
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>            Priority: Major
>         Attachments: event_seq.jpg
>
>
> {code:java}
> 2021-02-24 17:07:39,798 FATAL org.apache.hadoop.yarn.event.EventDispatcher: 
> Error in handling event type NODE_RESOURCE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:809)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeAndQueueResource(CapacityScheduler.java:1116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1505)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> 2021-02-24 17:07:39,798 INFO org.apache.hadoop.yarn.event.EventDispatcher: 
> Exiting, bbye..{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to