subject:"\[jira\] \[Commented\] \(YARN\-3893\) Both RM in active state when Admin#transitionToActive failure from refeshAll\(\)"


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727187#comment-14727187
 ] 

Hudson commented on YARN-3893:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #342 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/342/])
YARN-3893. Both RM in active state when Admin#transitionToActive failure from 
refeshAll() (Bibin A Chundatt via rohithsharmaks) (rohithsharmaks: rev 
7d6687fe76f6152a577ff2298c358dd30fce41fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727185#comment-14727185
 ] 

Hudson commented on YARN-3893:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1070 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1070/])
YARN-3893. Both RM in active state when Admin#transitionToActive failure from 
refeshAll() (Bibin A Chundatt via rohithsharmaks) (rohithsharmaks: rev 
7d6687fe76f6152a577ff2298c358dd30fce41fb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java


> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727346#comment-14727346
 ] 

Hudson commented on YARN-3893:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #335 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/335/])
YARN-3893. Both RM in active state when Admin#transitionToActive failure from 
refeshAll() (Bibin A Chundatt via rohithsharmaks) (rohithsharmaks: rev 
7d6687fe76f6152a577ff2298c358dd30fce41fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java


> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-09-02 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727115#comment-14727115
 ] 

Varun Saxena commented on YARN-3893:


+1...lgtm too

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727134#comment-14727134
 ] 

Hudson commented on YARN-3893:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8387 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8387/])
YARN-3893. Both RM in active state when Admin#transitionToActive failure from 
refeshAll() (Bibin A Chundatt via rohithsharmaks) (rohithsharmaks: rev 
7d6687fe76f6152a577ff2298c358dd30fce41fb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727444#comment-14727444
 ] 

Hudson commented on YARN-3893:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2284 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2284/])
YARN-3893. Both RM in active state when Admin#transitionToActive failure from 
refeshAll() (Bibin A Chundatt via rohithsharmaks) (rohithsharmaks: rev 
7d6687fe76f6152a577ff2298c358dd30fce41fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727397#comment-14727397
 ] 

Hudson commented on YARN-3893:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2264 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2264/])
YARN-3893. Both RM in active state when Admin#transitionToActive failure from 
refeshAll() (Bibin A Chundatt via rohithsharmaks) (rohithsharmaks: rev 
7d6687fe76f6152a577ff2298c358dd30fce41fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727419#comment-14727419
 ] 

Hudson commented on YARN-3893:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #325 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/325/])
YARN-3893. Both RM in active state when Admin#transitionToActive failure from 
refeshAll() (Bibin A Chundatt via rohithsharmaks) (rohithsharmaks: rev 
7d6687fe76f6152a577ff2298c358dd30fce41fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMFatalEventType.java
* hadoop-yarn-project/CHANGES.txt


> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-31 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723297#comment-14723297
 ] 

Rohith Sharma K S commented on YARN-3893:
-

+1 lgtm.. Will commit it tomorrow if there is no objections/comments from other 
folks.. 

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716784#comment-14716784
 ] 

Bibin A Chundatt commented on YARN-3893:


Testcase failures are not related to this patch.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716779#comment-14716779
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 42s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 44s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752740/0010-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0bf2854 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8929/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8929/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8929/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716642#comment-14716642
 ] 

Naganarasimha G R commented on YARN-3893:
-

Hi [~bibinchundatt]
2 there are test cases related to transition in 
TestRMAdminService.testRMHAWithFileSystemBasedConfiguration but most of it is 
present in TestRMHA so i think it should be fine.

3 Well IMHO it would be better be handled in the later approach i suggested, 
as {{refreshAll}} is just a private method but actual  operation is 
transistionToActive which Failed which is more readable than 
{{ACTIVE_REFRESH_FAIL}}

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716645#comment-14716645
 ] 

Naganarasimha G R commented on YARN-3893:
-

Oops saw this message late !

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716644#comment-14716644
 ] 

Naganarasimha G R commented on YARN-3893:
-

Oops saw this message late !

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716543#comment-14716543
 ] 

Naganarasimha G R commented on YARN-3893:
-

Hi [~bibinchundatt],
Thanks for the patch, test cases ran fine, approach and test case seems to be 
fine but few comments from my side 
# timeout of 90 is on the higher side is that much req or was it for local 
testing ?
# instead of test case in RMHA can we think of adding it to TestRMAdminService 
as the failure is related to transition to Active ? 
# May be while throwing RMFatalEvent better to wrap it with another exception 
wrapping the existing one and with the message that transition to active failed 
so that RM Logs have clear information on what operation it exited. or may be 
eventType instead of having {{ACTIVE_REFRESH_FAIL}} we can have more intuitive 
name {{TRANSITION_TO_ACTIVE_FAILED}}

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716563#comment-14716563
 ] 

Bibin A Chundatt commented on YARN-3893:


Hi Naga

Thnks for looking into patch

{quote}
timeout of 90 is on the higher side is that much req or was it for local 
testing ?
{quote}
will update the same.

{quote}
instead of test case in RMHA can we think of adding it to TestRMAdminService as 
the failure is related to transition to Active ?
{quote}
As i understand all transistiontoActive  HA related testcases are added in 
same class.

3.{{TRANSITION_TO_ACTIVE_FAILED}} is not actually failing its {{refreshAll}} 
rt? Thts the reason it gave specific name.

Points 2 and 3 are not mandatory fix items rt?

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

[
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712619#comment-14712619
]

Varun Saxena commented on YARN-3893:

I do not have any concern for exiting JVM. If fail fast is true(default
behavior), JVM will exit anyways.

I was wondering if it would be semantically appropriate to make JVM exit in
some cases if somebody has explicitly changed the fail fast config to false.
Logs can fill up if yarn-site.xml is wrong on both RMs' too.

I am not sure about the webapp part though. Does it require client rm service
to be initialized ? AFAIK, if RM is standby it will hit the webapp filter and
redirect to other RM(which may be active). Haven't tested UI after applying
previous patches, so maybe Bibin can tell. If there are some issues with
webapp, we will have to exit the JVM if transition to standby fails. Because
there may be no other way out then.
I will discuss further on this with you offline.

Both RM in active state when Admin#transitionToActive failure from refeshAll()
--

Key: YARN-3893
URL: https://issues.apache.org/jira/browse/YARN-3893
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch,
0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch,
yarn-site.xml

Cases that can cause this.
# Capacity scheduler xml is wrongly configured during switch
# Refresh ACL failure due to configuration
# Refresh User group failure due to configuration
Continuously both RM will try to be active
{code}
dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
./yarn rmadmin -getServiceState rm1
15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
active
dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
./yarn rmadmin -getServiceState rm2
15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
active
{code}
# Both Web UI active
# Status shown as active for both RM

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712637#comment-14712637
 ] 

Varun Saxena commented on YARN-3893:


Infact according to me, we can crash RM on all times if config is wrong. 
Because till config is corrected, the RM where config is wrong cannot become 
active(and hence will be unusable). In that case, fail fast config wont even be 
required. So should we change the behavior to keep RM in standby(but up) if 
fail fast is set to false ? Anyways can discuss more in detail face to face. 

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712896#comment-14712896
 ] 

Varun Saxena commented on YARN-3893:


The latest patch, 0008-YARN-3893.patch LGTM.
+1 pending Jenkins.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712745#comment-14712745
 ] 

Sunil G commented on YARN-3893:
---

As I see this, JVM exit is reasonable as proposed by Rohith earlier. Because 
scheduler configurations are wrong mostly, and its not required to switch to 
standby or fail-fast etc. Directly if we can exit JVM, it will be clean and 
there will be enough information available in logs to analyze for config fail 
reasons. 

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712942#comment-14712942
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 15s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 41s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 59s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  52m  9s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  90m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752428/0006-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8914/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8914/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8914/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713001#comment-14713001
 ] 

Bibin A Chundatt commented on YARN-3893:


Test failures are not related to this patch. Have looked into the failed 
testcases 

{{hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens}} - Due Bind 
exception 
{{hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService}}
 - Locally verified its working fine and success
{{hadoop.yarn.server.resourcemanager.TestClientRMService}} -Ran locally in 
eclipse its working fine

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712974#comment-14712974
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 32s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 48s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 19s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 14s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752434/0007-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713002#comment-14713002
 ] 

Bibin A Chundatt commented on YARN-3893:


Above comments are for 
https://builds.apache.org/job/PreCommit-YARN-Build/8915/testReport/

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712999#comment-14712999
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  53m 39s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 18s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752437/0008-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8916/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8916/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8916/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712767#comment-14712767
 ] 

Varun Saxena commented on YARN-3893:


Yes I agree. We can exit JVM directly. No need of using fail fast.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711290#comment-14711290
 ] 

Varun Saxena commented on YARN-3893:


Saw your comments above. We cant do what we were doing earlier because as you 
say WebApp should be up even in standby. Let me think if something else can be 
done.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710942#comment-14710942
 ] 

Rohith Sharma K S commented on YARN-3893:
-

Thanks [~bibinchundatt] for updating the patch. The patch mostly reasonable!!
Some comments on the patch
# Does {{isRMActive() }} check is required..? If transitionedToActive is 
success only then refreshAll will be executed!! IAC if you add also then check 
should be common for both i.e *_if_else*
# In the Test, below code expecting transitionToActive to be failed? Is so, 
then it RM state shoud not be in Active state. Why RM will be in Active if 
adminService fails to transition?
{code}
+try {
+  rm.adminService.transitionToActive(requestInfo);
+} catch (Exception e) {
+  assertTrue(Error when transitioning to Active mode.contains(e
+  .getMessage()));
+}
+assertEquals(HAServiceState.ACTIVE, rm.getRMContext().getHAServiceState());
{code}
# Have you verified the test locally? I have doubt that test may be exitted in 
the middle since you are changing the scheduler configuration. Scheduler 
configuration is loaded during transitionedToStandby which fails to load and 
*System.exit* is called.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711036#comment-14711036
 ] 

Varun Saxena commented on YARN-3893:


Few additional comments :

* Below exception block i.e. exception block after call to refreshAll, if 
{{YarnConfiguration.shouldRMFailFast(getConfig())}} is true, we merely post 
fatal event and do not return or throw an exception. This would lead to success 
audit log for transition to active being printed, which doesn't quite look 
correct. Because we are encountering some problem during call to transition. We 
should either return or throw a ServiceFailedException here as well. Although 
both are OK because RM would anyways be down later but I would prefer 
exception. 
{code} 
324 } catch (Exception e) {
325   if (isRMActive()  
YarnConfiguration.shouldRMFailFast(getConfig())) {
326 rmContext.getDispatcher().getEventHandler()
327 .handle(new 
RMFatalEvent(RMFatalEventType.ACTIVE_REFRESH_FAIL, e));
328   }else{
329 rm.handleTransitionToStandBy();
330 throw new ServiceFailedException(
331 Error on refreshAll during transistion to Active, e);
332   }
333 }
334 RMAuditLogger.logSuccess(user.getShortUserName(), 
transitionToActive,
335 RMHAProtocolService);
336   }
{code}

* In TestRMHA, below import is unused. 
{code}
import io.netty.channel.MessageSizeEstimator.Handle;
{code}

* A nit : There should be a space before else.
{code}
328   }else{
329 rm.handleTransitionToStandBy();
{code}

* In the test added, assert is not required in the exception block after first 
call to transitionToActive

* Maybe we can add an assert in test for service state being STANDBY after call 
to transitionToActive with incorrect capacity scheduler config and fail-fast 
being false.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710955#comment-14710955
 ] 

Rohith Sharma K S commented on YARN-3893:
-

To be more clear on the 3rd point, {{handleTransitionToStandBy}} call will exit 
if transitionToStandby fails. This transition may fail because during 
transition, active services are initialized. CS initialization loads the new 
capacity-schduler conf which result in wrong default queue capacity value 
result standby transition failure.
4. Instead of having separate class FatalEventCountDispatcher , can it be made 
inline?

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711164#comment-14711164
 ] 

Varun Saxena commented on YARN-3893:


Moreover, the fail fast configuration doesnt quite work as expected here. If 
capacity scheduler configuration is wrong, initialization will again fail and 
JVM will exit, which in essence is exactly same as the other case. We can 
handle fail fast as true case same way as earlier IMO.

The reason it works in the test(JVM does not exit) is that you have passed 
CapacitySchedulerConfiguration object to MockRM. As 
CapacitySchedulerConfiguration is not instanceof YarnConfiguration, this will 
lead to a new YarnConfiguration object being created and passed to 
ResourceManager.
When you are changing configuration in test and set queue capacity to 200, it 
is not reflecting in the Configuration object in ResourceManager class. That is 
why JVM does not exit when we transition to standby.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711171#comment-14711171
 ] 

Varun Saxena commented on YARN-3893:


Sorry I meant we can handle fail fast config being *false* case same way as we 
were doing in earlier patches. Otherwise checking for fail fast doesnt make any 
difference because both the code paths lead to same result. 

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

[
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711201#comment-14711201
]

Rohith Sharma K S commented on YARN-3893:
-

There are 2 type of refresh can happen i.e. 1. yarn-site.xml refresh, 2.
scheduler configurations refresh. Schduler configurations are reloaded for
every service initialization which is by design. If any issue in the scheduler
configuration, fail-fast configuraton behavior work as same for both true and
false. Fail-fast configuration is useful when admin do mistake in configuring
mistake in yarn-site.xml. With wrong configuration in yarn-site.xml, RM service
can be up whereas with wrong Scheduler configuration , service can NOT be up
at all. *On best effort basis for make service up*, handling exception for
yarn-site.xml and scheduler configuration are different.

BTW, making RM state StandBy would lead to filling up of the logs very soon
because of elector continuous try to make active. Any configuration issue,
better to exit the JVM and notify admin that RM is down so that admin can check
the logs and identify it.

Both RM in active state when Admin#transitionToActive failure from refeshAll()
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

[
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711274#comment-14711274
]

Varun Saxena commented on YARN-3893:

Hmm...my point of view based on the fact that the service cannot be up if
atleast one RM is not active. Standby RM is not going to serve anything
anyways.
Till configurations of this RM are not corrected, whether yarn-site or
scheduler configurations, this RM anyways cant become active (refreshAll will
always fail). And you can say there might be some silly mistake in scheduler
configuration too.

What we were doing before in the patch wont fill up the logs if configuration
is ok on other RM. And if its not Ok on other RM, logs will fill up even even
if refreshAll fails because of something other than scheduler config(and fail
fast is false).
fail fast by default is true, and if admin is making it false, he will know
what to expect.

But, you can say a RM shutting down is a far more alarming thing for an admin
and scheduler configurations more important. I agree with that. Maybe we can
make RM with wrong configuration down at all times. Because till he correct the
config(whether yarn-site or scheduler config), this RM cant become active.

Let us take opinion of couple of others as well on this. We can do whatever is
the consensus.

Both RM in active state when Admin#transitionToActive failure from refeshAll()
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711278#comment-14711278
 ] 

Varun Saxena commented on YARN-3893:


In previous patches, we were delaying reinitialization till attempting 
transition to active again and not attempting it immediately as we have done 
here. Any issues you expect with that ? 

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712308#comment-14712308
 ] 

Rohith Sharma K S commented on YARN-3893:
-

Hi [~varun_saxena], trying to understand your point of statement, my suggestion 
is to exit the RM if any configuration issue during refreshAll during 
{{AdminService#transitionToActive}}. As I given reason for making RM JVM down 
rather than keeping JVM alive in earlier 
[comment|https://issues.apache.org/jira/browse/YARN-3893?focusedCommentId=14711201page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14711201],
 do you have any concern for exiting the RM for configuration issues?

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
 yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-23 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708470#comment-14708470
 ] 

Rohith Sharma K S commented on YARN-3893:
-

I had closer look at either of the solutions as above. One of the potential 
issue in both are
# Moving createAndInitService just before starting activeServices in 
transitionToActive. 
## switch time will be impacted since every transitionToActive initializes 
active services.
## And RMWebApp has dependency on clienRMService for starting webapps. Without 
clientRMService initialization, RMWebapp can not be started. 
# Moving refreshAll before transitionToActive in adminService is same as 
triggering RMAdminCIi on standby node. This call throw StandByException and 
retried to active RM in RMAdminCli. When it comes to  
AdminService#transitionedToActive(), refreshing before 
{{rm.transitionedToActive}} throws an standby exception. 



 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-23 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708471#comment-14708471
 ] 

Rohith Sharma K S commented on YARN-3893:
-

I think for any configuration issues while transitioningToActive, Adminservice 
should not allow JVM to continue. Because if AdminService throws any exception 
back to elector, elector again try to make RM active which goes in loop forever 
filling the logs. 
There could be 2 calls can lead to point of failures i.e first 
{{rm.transitionedToActive}}, second {{refreshAll()}}. 
# If any failures in {{rm.transitionedToActive}} then RM services will be 
stopped and RM will be in STANDBY state.
# If {{refreshAll()}} fails, BOTH RM will be in ACTIVE state as per this 
defect. Continuing RM services with invalid configuration does not good idea. 
Moreover invalid configurations should be notified to user immediately. So it 
would be better to make use of fail-fast configuration to exit the RM JVM. If 
this configuration is set to false , then call {{rm.handleTransitionToStandBy}}.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-23 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708425#comment-14708425
 ] 

Bibin A Chundatt commented on YARN-3893:


Hi [~rohithsharma] 
Any comments on this.? 


 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-23 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708491#comment-14708491
 ] 

Varun Saxena commented on YARN-3893:


Using fail fast makes sense.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-19 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703197#comment-14703197
 ] 

Sunil G commented on YARN-3893:
---

Hi [~rohithsharma]
Thank you for restarting this thread.

The idea of calling {{createAndInitActiveServices}} from both 
ResourceManager#transitionToActive() and transistionToStandby is good . In this 
case, we can remove the call to {{refreshAll}} from 
AdminService#transistionToStandby.


 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-18 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701416#comment-14701416
 ] 

Bibin A Chundatt commented on YARN-3893:


Hi [~rohithsharma] 
Thank you for your review comments
Will update the same and upload patch soon.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-16 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698908#comment-14698908
 ] 

Rohith Sharma K S commented on YARN-3893:
-

Sorry for coming very late.. This issue has become stale, need to move forward!!
Regarding the patch, 
# Instead of setting boolean flag for reinitActiveServices in AdminService and 
other changes, moving {{createAndInitActiveServices();}} from 
transitionedToStandby to just before starting activeServices would solve such 
issues. And on exception transitioningToActive, handle add method 
stopActiveServices in ResourceManager#transitioningToActive() only. 
# Probably we can remove refreshAll() from AdminService#transitioneToActive if 
the above approach.

Any thoughts?

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-25 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641743#comment-14641743
 ] 

Bibin A Chundatt commented on YARN-3893:


{quote}
Instead of checking for exception message in test, can you check for 
ServiceFailedException
{quote}
Already the same is verified in may testcases using messages.

{quote}
Can you add a verification in the test to check whether active services were 
stopped ?
{quote}

IMO its not required.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-19 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632793#comment-14632793
 ] 

Varun Saxena commented on YARN-3893:


[~bibinchundatt]
# Instead of checking for exception message in test, can you check for 
{{ServiceFailedException}} ?
# Can you add a verification in the test to check whether active services were 
stopped ?

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627767#comment-14627767
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m  1s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 57s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 44s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  51m 58s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745400/0003-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8540/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8540/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8540/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8540/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-15 Thread Varun Saxena (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627950#comment-14627950
]

Varun Saxena commented on YARN-3893:

Thanks for the patch [~bibinchundatt]. Few comments.

# Nit : Should be Exception in state transition
{code}
throw new ServiceFailedException(
Exception in state transistion, re);
{code}
# IMO, no need to throw ServiceFailedException when catching exception while
calling reinitialize. The throw below should suffice. Just set the flag.
According to me, we should retain the original exception.
# Add a comment indicating what the flag does.
# Maybe rename the flag to reinitActiveServices instead of reinitialize.
# The flag according to me, semantically speaking, doesn't quite belong to
AdminService. Can be in ResourceManager or RMContext. Thoughts ?
# Can you add a test to verify the fix ?
# I think instead of relying on transitionToStandby to change state to standby,
we can explicitly change the state in AdminService. Thats because even
stopActiveServices can throw an Exception and if it does, state won't change to
STANDBY. This call to stop should not throw an exception, but as services keep
on getting added you never know how a particular service may behave. We should
be immune to it. Try something like below.
{code}

((RMContextImpl)rmContext).setHAServiceState(HAServiceProtocol.HAServiceState.STANDBY);
{code}
# Just a suggestion. If we do above, maybe call stopActiveServices and
reinitialize directly instead of calling transitonToStandby. This is because as
I said in a comment above, transitionToStandby would print an audit log saying
transition is successful. But reinitialize subsequently may fail. And not
printing this audit log will be consistent with transitionToActive failing
during starting active services. Thoughts ?

Both RM in active state when Admin#transitionToActive failure from refeshAll()
--

Key: YARN-3893
URL: https://issues.apache.org/jira/browse/YARN-3893
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch,
0003-YARN-3893.patch, yarn-site.xml

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628299#comment-14628299
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 30s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 50s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 54s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  52m  6s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  95m 19s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745452/0004-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8547/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8547/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8547/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626438#comment-14626438
 ] 

Sunil G commented on YARN-3893:
---

Hi [~varun_saxena]
bq.Reinitialization of Active Services is required. 
For this, I think calling {{rm.transitionToStandby(true)}} is not a good idea. 
Because same exception can come while initializing CapacityScheduler (cs config 
file).


 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626437#comment-14626437
 ] 

Sunil G commented on YARN-3893:
---

Hi [~varun_saxena]
bq.Reinitialization of Active Services is required. 
For this, I think calling {{rm.transitionToStandby(true)}} is not a good idea. 
Because same exception can come while initializing CapacityScheduler (cs config 
file).


 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-14 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626664#comment-14626664
 ] 

Xuan Gong commented on YARN-3893:
-

Sorry, should call stopAndReinitiate().

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626688#comment-14626688
 ] 

Varun Saxena commented on YARN-3893:


Yes but we do need to reinitialize services. Otherwise transition to active 
when everything is fine will not happen.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-14 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626661#comment-14626661
 ] 

Xuan Gong commented on YARN-3893:
-

How about first calling rm.transitionToStandby(false), then call 
activeService.reinitiate() (probably need to create this function) ? At least, 
the RM will transit to Standby. Even if the reinitiate() throws exception, the 
leader elector will handle this.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626677#comment-14626677
 ] 

Sunil G commented on YARN-3893:
---

Hi [~xgong]
Yes, we can do that. But I feel now we call 
{{rmContext.setHAServiceState(HAServiceProtocol.HAServiceState.STANDBY);}}
as last statement in {{transitionToStandby}}. SO if an exception happens in 
{{reinitialize}} code flow wont reach to set the state as Standby. So we may 
also need to set the state in context as Standby.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626694#comment-14626694
 ] 

Varun Saxena commented on YARN-3893:


We do need to stop active services because many threads would be spawned on 
attempt to transition to active.
Frankly, we can have a additional flag in RM indicating that reinitialization 
of services is required and attempt them while trying for transition to active. 
We can stop the services beforehand because no point having some threads 
running in standby. Thoughts ?
We can do something like below
{code}
  // Exception was thrown in call to refreshAll.
  if (rmContext.getHAServiceState() ==
  HAServiceProtocol.HAServiceState.ACTIVE) {
  
((RMContextImpl)rmContext).setHAServiceState(HAServiceProtocol.HAServiceState.STANDBY);
try {
  rm.stopActiveServices();
  // set a flag in RM(maybe rm context) indicating reinit of services 
is required on trying for transition to active despite state being standby.
} catch (Exception ex) {
}
{code}


 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

[
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626909#comment-14626909
]

Varun Saxena commented on YARN-3893:

[~sunilg], even I was suggesting earlier in my comment that we reinit only
while transitioning to active.

But then I thought that if we reinit on standby and there is a problem in
initialization, a failure can indicate the admin to correct his config. An
audit log will be printed.
If we do not reinit, a success audit log on transition to standby would be
printed, which may indicate no problem in config to admin.
Thoughts ?

We can procrastinate reiniting till transition to active as well. But its
better to indicate a failure even on standby IMHO. I do not see any harm in it.
I am fine either ways because reiniting really matters when transitioning to
active.

Both RM in active state when Admin#transitionToActive failure from refeshAll()
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626911#comment-14626911
 ] 

Varun Saxena commented on YARN-3893:


I am fine either ways though because as you said reiniting really matters when 
transitioning to active.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627550#comment-14627550
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m  8s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 46s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 25s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  51m 13s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  89m 13s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMHA |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745382/0002-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0a16ee6 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8539/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8539/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8539/console |


This message was automatically generated.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626833#comment-14626833
 ] 

Varun Saxena commented on YARN-3893:


Ok, lets add a flag. According to me, we need to check this flag and do reinit 
even on transitionToStandby even though state is standby.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-14 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626826#comment-14626826
 ] 

Xuan Gong commented on YARN-3893:
-

Thanks for [~varun_saxena] and [~sunilg]. I am fine with adding a new internal 
state although it might be too complex. But if we could handle this correctly, 
I am fine with this.

To this specific issue, I think that at least two things we should do here:
1) stop All ActiveService
2) transit to standby. (basically, set RM state in RMContext as Standby)
But, we also need to reinitiate all the active service to prepare for the 
transitToActive call. 
At least, we should do:
{code}
rm.transitToStandy(false);
reinitiateActiveService();
{code}
Here the reinitiateActiveService() can throw out the same exception. And I can 
see why this does not solve the whole problem.

How about we introduce a new atomicBoolean flag to track whether we need to 
reinitiate active service ? And we could add following into transitToActive 
logic
{code}
if (reinitiateRequired)
   reinitiateActiveService()
{code}
before we start all the active service.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626873#comment-14626873
 ] 

Sunil G commented on YARN-3893:
---

+1 for using atomicBoolean flag.
Do we really need to call {{reinitiateActiveService}} from 
{{transitionToStandby}}. I think it can be done while we invoke 
{{transitionToActive}} when it matters. 

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626716#comment-14626716
 ] 

Sunil G commented on YARN-3893:
---

I remember an earlier suggestion of new HAServiceState. 
Introducing a state as WAITING_FOR_ACTIVE may help to do all reinit or other 
inits when we try to move to ACTIVE. Also as mentioned earlier, this can be 
hidden state internally. It may look more cleaner than flag. So along with 
above solution, could we add this new state also?

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626725#comment-14626725
 ] 

Varun Saxena commented on YARN-3893:


[~xgong], issue with reinitialization is that if exception is thrown during 
initialization then all the active services will be stopped. 
And when we transitiontoactive we will directly attempt to start active 
services which would fail because services are in state STOPPED.

I think we can forcibly set the state to standby and set a flag in RMContext 
indicating reinit is required whenever attempting transition to standby or 
active. This way we will let leader election handle the exception.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()


[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626728#comment-14626728
 ] 

Varun Saxena commented on YARN-3893:


Yeah [~sunilg], a new state can also be introduced as I suggested. This would 
act similar to a flag.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()