[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708471#comment-14708471
 ] 

Rohith Sharma K S commented on YARN-3893:
-----------------------------------------

I think for any configuration issues while transitioningToActive, Adminservice 
should not allow JVM to continue. Because if AdminService throws any exception 
back to elector, elector again try to make RM active which goes in loop forever 
filling the logs. 
There could be 2 calls can lead to point of failures i.e first 
{{rm.transitionedToActive}}, second {{refreshAll()}}. 
# If any failures in {{rm.transitionedToActive}} then RM services will be 
stopped and RM will be in STANDBY state.
# If {{refreshAll()}} fails, BOTH RM will be in ACTIVE state as per this 
defect. Continuing RM services with invalid configuration does not good idea. 
Moreover invalid configurations should be notified to user immediately. So it 
would be better to make use of fail-fast configuration to exit the RM JVM. If 
this configuration is set to false , then call {{rm.handleTransitionToStandBy}}.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> ------------------------------------------------------------------------------
>
>                 Key: YARN-3893
>                 URL: https://issues.apache.org/jira/browse/YARN-3893
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to