[ https://issues.apache.org/jira/browse/YARN-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680408#comment-14680408 ]
Wangda Tan commented on YARN-4000: ---------------------------------- +1 to what [~jlowe]'s suggestion, we should kill app (if rm.fail-fast is false) or fail RM (if rm.fail-fast is true). This is similar to https://issues.apache.org/jira/browse/YARN-3764, we need to consider LeafQueue's movement as well. Currently RM restart will be succeeded (haven't verified, just my guess) if we move a leaf queue from one parent to another during restart. We should fail app/rm in the case before we support removing queues. > RM crashes with NPE if leaf queue becomes parent queue during restart > --------------------------------------------------------------------- > > Key: YARN-4000 > URL: https://issues.apache.org/jira/browse/YARN-4000 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, resourcemanager > Affects Versions: 2.6.0 > Reporter: Jason Lowe > Assignee: Varun Saxena > > This is a similar situation to YARN-2308. If an application is active in > queue A and then the RM restarts with a changed capacity scheduler > configuration where queue A becomes a parent queue to other subqueues then > the RM will crash with a NullPointerException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)