[
https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903387#comment-14903387
]
Varun Saxena commented on YARN-4000:
------------------------------------
[~jianhe]
bq. actually, I think this will be a problem in regular case. Application is
being killed by user right on RM restart. This is an existing problem though.
Do you think so ?
You mean user killing the application and we killing the application too at the
same time ? But RM will first do the recovery and then only open any of the
ports while transitioning to active. So ClientRMService or
ResourceTrackerService wont even start till recovery is done. So most probably
by the time kill from user comes, all the recovery related events should be
processed. Even if they are not processed, they will be ahead in the dispatcher
queue. A KILL event if app is already KILLING would be ignored by RMAppImpl.
> RM crashes with NPE if leaf queue becomes parent queue during restart
> ---------------------------------------------------------------------
>
> Key: YARN-4000
> URL: https://issues.apache.org/jira/browse/YARN-4000
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler, resourcemanager
> Affects Versions: 2.6.0
> Reporter: Jason Lowe
> Assignee: Varun Saxena
> Attachments: YARN-4000.01.patch, YARN-4000.02.patch,
> YARN-4000.03.patch, YARN-4000.04.patch, YARN-4000.05.patch
>
>
> This is a similar situation to YARN-2308. If an application is active in
> queue A and then the RM restarts with a changed capacity scheduler
> configuration where queue A becomes a parent queue to other subqueues then
> the RM will crash with a NullPointerException.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)