[ https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403423#comment-15403423 ]
Rohith Sharma K S commented on YARN-5333: ----------------------------------------- Thanks for the patch, some comments # Should {{private boolean isTransitingToActive = false;}} is volatile? # Since none of the refreshXXX methods are synchronized, patch introduces a concurrency issue. If there is an explicit admin call for refreshing at the time of transitionToActive, then checkRMStatus will be executed for other admin calls. Until RM transition-to-active completely, explicit admin commands should not allowed to refresh. I think, we should incorporate similar to refreshAdminAcl method. # I think flag {{checkRMHAState}} can be passed to method {{checkRMStatus}}. Test: # I think if you can simulate test for generally instead of specific to fair scheduler, this test can be moved to class {{TestRMHA}}. There is already test {{TestRMHA#testTransitionedToActiveRefreshFail}}, probable the same test can be changed? > Some recovered apps are put into default queue when RM HA > --------------------------------------------------------- > > Key: YARN-5333 > URL: https://issues.apache.org/jira/browse/YARN-5333 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Jun Gong > Assignee: Jun Gong > Attachments: YARN-5333.01.patch, YARN-5333.02.patch, > YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch > > > Enable RM HA and use FairScheduler, > {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, > {{yarn.scheduler.fair.user-as-default-queue}} is set to false. > Reproduce steps: > 1. Start two RMs. > 2. After RMs are running, change both RM's file > {{etc/hadoop/fair-scheduler.xml}}, then add some queues. > 3. Submit some apps to the new added queues. > 4. Stop the active RM, then the standby RM will transit to active and recover > apps. > However the new active RM will put recovered apps into default queue because > it might have not loaded the new {{fair-scheduler.xml}}. We need call > {{initScheduler}} before start active services or bring {{refreshAll()}} in > front of {{rm.transitionToActive()}}. *It seems it is also important for > other scheduler*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org