[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246442#comment-14246442
 ] 

Rohith commented on YARN-2340:
------------------------------

Scenario executed
# Start Yarn cluster, and submit long running application to Queue to 
default.Initially, RM1 is active
# *Stop the queue default* in both RM1 and RM2 using -refreshQueue. Queue can 
be stopped even when application is running, but wont accept new application 
submissions.
# Switch the RM, let RM2 transitionedToActive. But here application recovery 
fails since queue already stopped. Below logs shows the failure, but *RMAppImpl 
state is updated as FAILED RMAppAttempt remain as null*. RM remain in standby
{noformat}
2014-12-15 11:01:17,813 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1418620667348_0001 with 1 attempts and final state = null
2014-12-15 11:01:17,814 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1418620667348_0001_000001 with final state: null
/////.....
/////....
2014-12-15 11:01:17,824 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Queue root.default is STOPPED. Cannot accept submission of application: 
application_1418620667348_0001
2014-12-15 11:01:17,825 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to submit application application_1418620667348_0001 to queue default 
from user rohith
org.apache.hadoop.security.AccessControlException: Queue root.default is 
STOPPED. Cannot accept submission of application: application_1418620667348_0001
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.submitApplication(LeafQueue.java:575)

2014-12-15 11:01:17,939 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering app attempt : appattempt_1418620667348_0001_000001
2014-12-15 11:01:17,941 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
application application_1418620667348_0001 with final state: FAILED
{noformat}
# After restart , Final state in RMApp=FAILED and RMAppImpl=null as shown 
below. RM can not recover the applications, and continuously fails. 
{noformat}
2014-12-15 11:01:41,493 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Recovering app: 
application_1418620667348_0001 with 1 attempts and final state = FAILED
2014-12-15 11:01:41,494 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Recovering attempt: appattempt_1418620667348_0001_000001 with final state: null
{noformat}

> NPE thrown when RM restart after queue is STOPPED
> -------------------------------------------------
>
>                 Key: YARN-2340
>                 URL: https://issues.apache.org/jira/browse/YARN-2340
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, scheduler
>    Affects Versions: 2.4.1
>         Environment: Capacityscheduler with Queue a, b
>            Reporter: Nishan Shetty
>            Assignee: Rohith
>            Priority: Critical
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_000002 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to