[ https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430369#comment-16430369 ]
can_he commented on YARN-6629: ------------------------------ It looks like we have met the same question on production environment at branch-2.9.0 The fatal logs are shown as follows: {code:java} 2018-04-09 13:04:48,291 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e15_1521515682198_27 22_04_000430 Container Transitioned from ALLOCATED to ACQUIRED 2018-04-09 13:04:48,292 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type NODE_UPDATE to the Event Disp atcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:448) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:528) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2489) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacitySche duler.java:2371) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityS cheduler.java:1344) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacitySc heduler.java:1333) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacitySchedul er.java:1422) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacitySchedul er.java:1197) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1059) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1464) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:150) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.lang.Thread.run(Thread.java:745) 2018-04-09 13:04:48,292 INFO org.apache.hadoop.yarn.event.EventDispatcher: Exiting, bbye.. 2018-04-09 13:04:48,295 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted {code} [~Tao Yang] [~leftnoteasy] I will be particularly *grateful* if you can submit a patch for branch-2.9.0 and commit it. Thanks a lot! > NPE occurred when container allocation proposal is applied but its resource > requests are removed before > ------------------------------------------------------------------------------------------------------- > > Key: YARN-6629 > URL: https://issues.apache.org/jira/browse/YARN-6629 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.9.0, 3.0.0-alpha2 > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Critical > Fix For: 3.1.0 > > Attachments: YARN-6629.001.patch, YARN-6629.002.patch, > YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, > YARN-6629.006.patch > > > I wrote a test case to reproduce another problem for branch-2 and found new > NPE error, log: > {code} > FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in > handling event type NODE_UPDATE to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516) > at > org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225) > at > org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31) > at org.mockito.internal.MockHandler.handle(MockHandler.java:97) > at > org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply(<generated>) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > at java.lang.Thread.run(Thread.java:745) > {code} > Reproduce this error in chronological order: > 1. AM started and requested 1 container with schedulerRequestKey#1 : > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests > Added schedulerRequestKey#1 into schedulerKeyToPlacementSets > 2. Scheduler allocatd 1 container for this request and accepted the proposal > 3. AM removed this request > ApplicationMasterService#allocate --> CapacityScheduler#allocate --> > SchedulerApplicationAttempt#updateResourceRequests --> > AppSchedulingInfo#updateResourceRequests --> > AppSchedulingInfo#addToPlacementSets --> > AppSchedulingInfo#updatePendingResources > Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets) > 4. Scheduler applied this proposal > CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> > AppSchedulingInfo#allocate > Throw NPE when called > schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, > type, node); -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org