[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626694#comment-14626694
 ] 

Varun Saxena commented on YARN-3893:
------------------------------------

We do need to stop active services because many threads would be spawned on 
attempt to transition to active.
Frankly, we can have a additional flag in RM indicating that reinitialization 
of services is required and attempt them while trying for transition to active. 
We can stop the services beforehand because no point having some threads 
running in standby. Thoughts ?
We can do something like below
{code}
      // Exception was thrown in call to refreshAll.
      if (rmContext.getHAServiceState() ==
          HAServiceProtocol.HAServiceState.ACTIVE) {
      
((RMContextImpl)rmContext).setHAServiceState(HAServiceProtocol.HAServiceState.STANDBY);
        try {
          rm.stopActiveServices();
          // set a flag in RM(maybe rm context) indicating reinit of services 
is required on trying for transition to active despite state being standby.
        } catch (Exception ex) {        
        }
{code}


> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> ------------------------------------------------------------------------------
>
>                 Key: YARN-3893
>                 URL: https://issues.apache.org/jira/browse/YARN-3893
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to