[
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711201#comment-14711201
]
Rohith Sharma K S commented on YARN-3893:
-----------------------------------------
There are 2 type of refresh can happen i.e. 1. yarn-site.xml refresh, 2.
scheduler configurations refresh. Schduler configurations are reloaded for
every service initialization which is by design. If any issue in the scheduler
configuration, fail-fast configuraton behavior work as same for both true and
false. Fail-fast configuration is useful when admin do mistake in configuring
mistake in yarn-site.xml. With wrong configuration in yarn-site.xml, RM service
can be up whereas with wrong Scheduler configuration , service can NOT be up
at all. *On best effort basis for make service up*, handling exception for
yarn-site.xml and scheduler configuration are different.
BTW, making RM state StandBy would lead to filling up of the logs very soon
because of elector continuous try to make active. Any configuration issue,
better to exit the JVM and notify admin that RM is down so that admin can check
the logs and identify it.
> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> ------------------------------------------------------------------------------
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.7.1
> Reporter: Bibin A Chundatt
> Assignee: Bibin A Chundatt
> Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch,
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch,
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
> ./yarn rmadmin -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
> ./yarn rmadmin -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)