[jira] [Commented] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

Varun Saxena (JIRA) Tue, 22 Sep 2015 13:37:42 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903387#comment-14903387
 ]


Varun Saxena commented on YARN-4000:
------------------------------------

[~jianhe]

bq. actually, I think this will be a problem in regular case. Application is 
being killed by user right on RM restart. This is an existing problem though. 
Do you think so ?
You mean user killing the application and we killing the application too at the 
same time ? But RM will first do the recovery and then only open any of the 
ports while transitioning to active. So ClientRMService or 
ResourceTrackerService wont even start till recovery is done. So most probably 
by the time kill from user comes, all the recovery related events should be 
processed. Even if they are not processed, they will be ahead in the dispatcher 
queue. A KILL event if app is already KILLING would be ignored by RMAppImpl.



> RM crashes with NPE if leaf queue becomes parent queue during restart
> ---------------------------------------------------------------------
>
>                 Key: YARN-4000
>                 URL: https://issues.apache.org/jira/browse/YARN-4000
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler, resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Varun Saxena
>         Attachments: YARN-4000.01.patch, YARN-4000.02.patch, 
> YARN-4000.03.patch, YARN-4000.04.patch, YARN-4000.05.patch
>
>
> This is a similar situation to YARN-2308.  If an application is active in 
> queue A and then the RM restarts with a changed capacity scheduler 
> configuration where queue A becomes a parent queue to other subqueues then 
> the RM will crash with a NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4000) RM crashes with NPE if leaf queue becomes parent queue during restart

Reply via email to