[
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387915#comment-15387915
]
Sunil G commented on YARN-5333:
-------------------------------
[~hex108], thanks for the clarification. With YARN-3893, we were trying to
fail-fast RM if wrong capacity-scheduler is present. With the current patch,
{code}
try {
+ reinitializeActiveServices();
startActiveServices();
return null;
} catch (Exception e) {
{code}
any exception during queue reinitialize will not make RM fail-fast. So I think
you can have {{reinitializeActiveServices}} in another try block and invoke RM
fail-fast with its exception handling block.
However one more thing worries me. with this patch, reinitialize queue is done
before starting the active services. So many service like nodelabel manager etc
are not started (or dispatcher threads are not started). So if
{{reinitialize}} has some event call flow, then such case may be a pblm. But as
far as I checked, no such event handling is present in {{reinitialize}} call
flow. Still I suggest to confirm once, I will also verify and will update if I
find some leads.
> Some recovered apps are put into default queue when RM HA
> ---------------------------------------------------------
>
> Key: YARN-5333
> URL: https://issues.apache.org/jira/browse/YARN-5333
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Jun Gong
> Assignee: Jun Gong
> Attachments: YARN-5333.01.patch, YARN-5333.02.patch,
> YARN-5333.03.patch
>
>
> Enable RM HA and use FairScheduler,
> {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false,
> {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file
> {{etc/hadoop/fair-scheduler.xml}}, then add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover
> apps.
> However the new active RM will put recovered apps into default queue because
> it might have not loaded the new {{fair-scheduler.xml}}. We need call
> {{initScheduler}} before start active services or bring {{refreshAll()}} in
> front of {{rm.transitionToActive()}}. *It seems it is also important for
> other scheduler*.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]