[ https://issues.apache.org/jira/browse/YARN-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291263#comment-17291263 ]
Haibo Chen commented on YARN-10651: ----------------------------------- I updated the patch to add some logging. Unit test wise, the key condition to trigger this is that the scheduler thread must process a healthy node update event after the corresponding node turned into the DECOMMISSIONING state (see the diagram for event ordering), which only happens in a very busy cluster. There isn't anything we can use right now in unit test to artificially slow down the scheduler thread, wait for the node to be DECOMMISSIONING and then allow it to process node update. > CapacityScheduler crashed with NPE in > AbstractYarnScheduler.updateNodeResource() > --------------------------------------------------------------------------------- > > Key: YARN-10651 > URL: https://issues.apache.org/jira/browse/YARN-10651 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.10.0, 2.10.1 > Reporter: Haibo Chen > Assignee: Haibo Chen > Priority: Major > Attachments: YARN-10651.00.patch, YARN-10651.01.patch, event_seq.jpg > > > {code:java} > 2021-02-24 17:07:39,798 FATAL org.apache.hadoop.yarn.event.EventDispatcher: > Error in handling event type NODE_RESOURCE_UPDATE to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:809) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeAndQueueResource(CapacityScheduler.java:1116) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1505) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:748) > 2021-02-24 17:07:39,798 INFO org.apache.hadoop.yarn.event.EventDispatcher: > Exiting, bbye..{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org