[ 
https://issues.apache.org/jira/browse/YARN-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291263#comment-17291263
 ] 

Haibo Chen commented on YARN-10651:
-----------------------------------

I updated the patch to add some logging. 

Unit test wise, the key condition to trigger this is that the scheduler thread 
must process a healthy node update event after the corresponding node turned 
into the DECOMMISSIONING state (see the diagram for event ordering), which only 
happens in a very busy cluster.

There isn't anything we can use right now in unit test to artificially slow 
down the scheduler thread, wait for the node to be DECOMMISSIONING and then 
allow it to process node update.  

 

> CapacityScheduler crashed with NPE in 
> AbstractYarnScheduler.updateNodeResource() 
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-10651
>                 URL: https://issues.apache.org/jira/browse/YARN-10651
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.10.0, 2.10.1
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>            Priority: Major
>         Attachments: YARN-10651.00.patch, YARN-10651.01.patch, event_seq.jpg
>
>
> {code:java}
> 2021-02-24 17:07:39,798 FATAL org.apache.hadoop.yarn.event.EventDispatcher: 
> Error in handling event type NODE_RESOURCE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:809)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeAndQueueResource(CapacityScheduler.java:1116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1505)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> 2021-02-24 17:07:39,798 INFO org.apache.hadoop.yarn.event.EventDispatcher: 
> Exiting, bbye..{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to