[ 
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403423#comment-15403423
 ] 

Rohith Sharma K S commented on YARN-5333:
-----------------------------------------

Thanks for the patch, some comments
# Should {{private boolean isTransitingToActive = false;}} is volatile?
# Since none of the refreshXXX methods are synchronized, patch introduces a 
concurrency issue. If there is an explicit admin call for refreshing at the 
time of  transitionToActive, then checkRMStatus will be executed for other 
admin calls. Until RM transition-to-active completely, explicit admin commands 
should not allowed to refresh. I think, we should incorporate similar to 
refreshAdminAcl method.
# I think flag {{checkRMHAState}} can be passed to method {{checkRMStatus}}.

Test:
# I think if you can simulate test for generally instead of specific to fair 
scheduler, this test can be moved to class {{TestRMHA}}. There is already test 
{{TestRMHA#testTransitionedToActiveRefreshFail}}, probable the same test can be 
changed?

> Some recovered apps are put into default queue when RM HA
> ---------------------------------------------------------
>
>                 Key: YARN-5333
>                 URL: https://issues.apache.org/jira/browse/YARN-5333
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-5333.01.patch, YARN-5333.02.patch, 
> YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch
>
>
> Enable RM HA and use FairScheduler, 
> {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, 
> {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file 
> {{etc/hadoop/fair-scheduler.xml}}, then add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover 
> apps.
> However the new active RM will put recovered apps into default queue because 
> it might have not loaded the new {{fair-scheduler.xml}}. We need call 
> {{initScheduler}} before start active services or bring {{refreshAll()}} in 
> front of {{rm.transitionToActive()}}. *It seems it is also important for 
> other scheduler*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to