Jian He commented on YARN-2340:

Today, the semantics to stop a queue is to let the existing applications run 
into completion. We should retain the same semantics for RM restart as well. In 
this case, I think we need to ignore this exception and continue because the 
application was accepted before the queue is changed to stopped. Similar 
problem could happen if we change the application acl and restart RM while 
application is running. 

> NPE thrown when RM restart after queue is STOPPED. There after RM can not 
> recovery application's and remain in standby
> ----------------------------------------------------------------------------------------------------------------------
>                 Key: YARN-2340
>                 URL: https://issues.apache.org/jira/browse/YARN-2340
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, scheduler
>    Affects Versions: 2.4.1
>         Environment: Capacityscheduler with Queue a, b
>            Reporter: Nishan Shetty
>            Assignee: Rohith
>            Priority: Critical
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_000002 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..

This message was sent by Atlassian JIRA

Reply via email to