[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2016-01-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118116#comment-15118116
 ] 

Bikas Saha commented on YARN-2019:
--

Does this now mean that during a failover the new RM could forget about the 
jobs that failed to get stored by the previous RM?

> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734299#comment-14734299
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #341 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/341/])
YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: 
rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734121#comment-14734121
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #359 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/359/])
YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: 
rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734092#comment-14734092
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8411 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8411/])
YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: 
rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734202#comment-14734202
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2279 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2279/])
Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: 
rev 6a5068970567817af78d6ec2dfe474b09533a8d2)
* hadoop-yarn-project/CHANGES.txt


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734164#comment-14734164
 ] 

Hudson commented on YARN-2019:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2301 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2301/])
Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: 
rev 6a5068970567817af78d6ec2dfe474b09533a8d2)
* hadoop-yarn-project/CHANGES.txt


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734270#comment-14734270
 ] 

Hudson commented on YARN-2019:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2302 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2302/])
YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: 
rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734100#comment-14734100
 ] 

Bikas Saha commented on YARN-2019:
--

Sorry for coming in late on this. There would be 2 kinds of state store 
operations - reads and writes. Writes may be of 2 kinds - critical and 
non-critical. E.g. saving an application submission is critical. Saving a node 
information is perhaps not critical. It would affect system correctness is 
critical write operation errors are allowed to be ignored. We end up with 
YARN-4118 and other such potential issues. Essentially we are turning a 
write-ahead log into something that optional. I dont see how the system can 
make stable reliability guarantees by making the write-ahead log non-fatal.
On the other hand read errors or non-critical write errors should not block RM 
progress but do need to be potentially retried. That also does not seem to be 
addressed in the patch.

> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734185#comment-14734185
 ] 

Hudson commented on YARN-2019:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #352 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/352/])
Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: 
rev 6a5068970567817af78d6ec2dfe474b09533a8d2)
* hadoop-yarn-project/CHANGES.txt
YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: 
rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734149#comment-14734149
 ] 

Hudson commented on YARN-2019:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1090 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1090/])
Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: 
rev 6a5068970567817af78d6ec2dfe474b09533a8d2)
* hadoop-yarn-project/CHANGES.txt
YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: 
rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734089#comment-14734089
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #358 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/358/])
Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: 
rev 6a5068970567817af78d6ec2dfe474b09533a8d2)
* hadoop-yarn-project/CHANGES.txt


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734181#comment-14734181
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #340 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/340/])
Modify the CHANGES.txt to move YARN-2019 Commit log from branch 2.8 to (xgong: 
rev 6a5068970567817af78d6ec2dfe474b09533a8d2)
* hadoop-yarn-project/CHANGES.txt


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-09-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734280#comment-14734280
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2280 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2280/])
YARN-4087. Followup fixes after YARN-2019 regarding RM behavior when (xgong: 
rev 9b78e6e33d8c117c1e909df414f20d9db56efe4b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


> Retrospect on decision of making RM crashed if any exception throw in 
> ZKRMStateStore
> 
>
> Key: YARN-2019
> URL: https://issues.apache.org/jira/browse/YARN-2019
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Critical
>  Labels: ha
> Fix For: 2.8.0, 2.7.2, 2.6.2
>
> Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch
>
>
> Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
> exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
> internal bug itself, but not fatal exception. We should retrospect some 
> decision here as HA feature is designed to protect key component but not 
> disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638671#comment-14638671
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #995 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/995/])
YARN-2019. Retrospect on decision of making RM crashed if any exception throw 
in ZKRMStateStore. Contributed by Jian He. (junping_du: rev 
ee98d6354bbbcd0832d3e539ee097f837e5d0e31)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Fix For: 2.8.0

 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638779#comment-14638779
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2192 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2192/])
YARN-2019. Retrospect on decision of making RM crashed if any exception throw 
in ZKRMStateStore. Contributed by Jian He. (junping_du: rev 
ee98d6354bbbcd0832d3e539ee097f837e5d0e31)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Fix For: 2.8.0

 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638693#comment-14638693
 ] 

Hudson commented on YARN-2019:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #265 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/265/])
YARN-2019. Retrospect on decision of making RM crashed if any exception throw 
in ZKRMStateStore. Contributed by Jian He. (junping_du: rev 
ee98d6354bbbcd0832d3e539ee097f837e5d0e31)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Fix For: 2.8.0

 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638915#comment-14638915
 ] 

Hudson commented on YARN-2019:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #254 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/254/])
YARN-2019. Retrospect on decision of making RM crashed if any exception throw 
in ZKRMStateStore. Contributed by Jian He. (junping_du: rev 
ee98d6354bbbcd0832d3e539ee097f837e5d0e31)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt


 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Fix For: 2.8.0

 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638997#comment-14638997
 ] 

Hudson commented on YARN-2019:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #262 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/262/])
YARN-2019. Retrospect on decision of making RM crashed if any exception throw 
in ZKRMStateStore. Contributed by Jian He. (junping_du: rev 
ee98d6354bbbcd0832d3e539ee097f837e5d0e31)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Fix For: 2.8.0

 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639021#comment-14639021
 ] 

Hudson commented on YARN-2019:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2211 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2211/])
YARN-2019. Retrospect on decision of making RM crashed if any exception throw 
in ZKRMStateStore. Contributed by Jian He. (junping_du: rev 
ee98d6354bbbcd0832d3e539ee097f837e5d0e31)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Fix For: 2.8.0

 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637310#comment-14637310
 ] 

Karthik Kambatla commented on YARN-2019:


Sure. A per-daemon config sounds very reasonable. We could have one each for 
resourcemanager, nodemanager, timelineserver, app

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637312#comment-14637312
 ] 

Karthik Kambatla commented on YARN-2019:


And, may be also add a {{yarn.fail-fast}} as convenience method that rest would 
pick up from if not explicitly set. 

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637701#comment-14637701
 ] 

Karthik Kambatla commented on YARN-2019:


+1, pending Jenkins.

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637618#comment-14637618
 ] 

Karthik Kambatla commented on YARN-2019:


Comments on the patch:
# Instead of having a separate default for all daemons, can all of them default 
to yarn.fail-fast? The default for yarn.fail-fast could be true? 
# Should we have convenience methods in YarnConfiguration to fetch the 
fail-fast value for individual daemons. e.g. {{shouldRMFailFast()}}

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637694#comment-14637694
 ] 

Jian He commented on YARN-2019:
---

thanks for the review, Karthik !
Addressed both comments.

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637992#comment-14637992
 ] 

Junping Du commented on YARN-2019:
--

+1. Committing it in.

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638019#comment-14638019
 ] 

Hudson commented on YARN-2019:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #8204 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8204/])
YARN-2019. Retrospect on decision of making RM crashed if any exception throw 
in ZKRMStateStore. Contributed by Jian He. (junping_du: rev 
ee98d6354bbbcd0832d3e539ee097f837e5d0e31)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Fix For: 2.8.0

 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637905#comment-14637905
 ] 

Hadoop QA commented on YARN-2019:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 49s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  51m 54s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m 47s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12746639/YARN-2019.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 06e5dd2 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8623/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8623/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8623/console |


This message was automatically generated.

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637931#comment-14637931
 ] 

Hadoop QA commented on YARN-2019:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 52s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  51m 55s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m 36s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12746651/YARN-2019.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 06e5dd2 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8624/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8624/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8624/console |


This message was automatically generated.

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch, YARN-2019.patch, YARN-2019.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635301#comment-14635301
 ] 

Karthik Kambatla commented on YARN-2019:


We could have two fail-fast configs - one for daemon and one for app/container. 
If we could do with general fail-fast configs, we should try and avoid adding 
component-specific configs; otherwise, we ll end up making configuring Yarn 
even harder. 

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-21 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635876#comment-14635876
 ] 

Junping Du commented on YARN-2019:
--

If so, I think we should at least differentiate RM and NM policies - user could 
be conservative to RM state store failure but be aggressive to NM state store 
failure. May be using yarn.resourcemanager.fail-fast here? Then we can use 
yarn.nodemanager.fail-fast later and may for other daemons (timeline service, 
etc.).

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634638#comment-14634638
 ] 

Karthik Kambatla commented on YARN-2019:


How about adopting the approach proposed in YARN-3607?

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-21 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634951#comment-14634951
 ] 

Junping Du commented on YARN-2019:
--

+1 on general idea of YARN-3607. However, here users may have three options 
actually when facing error of ZKRMStateStore:
1. aggressive to fail RM daemon;
2. conservative to only log these errors without failed RM daemon and any 
applications;
3. relative conservative - not failed RM but failed application in some cases 
(like RM get restarted).
These choices may hint we may not want to force the policy of handling on all 
failures into a single configuration, although I agree we should 
combine/consolidate them as many as possible like what proposed by YARN-3607. 
Particularly in this case, I may prefer to add a separated configuration (may 
be something like: a boolean value for 
yarn.resourcemanager.state-store.exit-on-error or an enum value for 
yarn.resourcemanager.state-store.policy-on-error?) to allow user to choose 
when facing RM state store failures. So user got other options for other 
failure cases.

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Jian He
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2015-07-20 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634333#comment-14634333
 ] 

Jian He commented on YARN-2019:
---

I agree that we should provide an option to not crash RM if some exception 
happened in state-store.  

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2014-06-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037664#comment-14037664
 ] 

Junping Du commented on YARN-2019:
--

[~kasha], sorry that I ignored your comments as my email/company changed during 
that time. My thought on right behave is:
If any issue in ZK cluster side, although it is distributed and should be more 
robust but could down due to bug or bad configuration, we can let ActiveRM 
continue to run as no-HA case. In addition, we should report Admin that the HA 
is not playing well, and let admin to decide when it is the proper timeline to 
bring down RM and reconfigure the HA things. Make sense?

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2014-05-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989487#comment-13989487
 ] 

Junping Du commented on YARN-2019:
--

The bad news could be: the exception could be repeated on new active RM as 
ZKRMStateStore is shared. Am I missing anything here?

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Priority: Critical
  Labels: ha

 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2014-05-05 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990037#comment-13990037
 ] 

Tsuyoshi OZAWA commented on YARN-2019:
--

RMStateStore handles the exceptions in ZKRMStateStore like this: 
{code}
try {
  // ZK related operations
  removeRMDTMasterKeyState(delegationKey);
} catch (Exception e) {
  notifyStoreOperationFailed(e);
}
{code}

If it's fenced, RMFatalEventDispatcher handles the exceptions and RM goes into 
standby state. However, if STATE_STORE_OP_FAILED occurs, Active RM terminates. 
After fail-over to standby RM, the exception could be repeated on new active 
RM. Maybe this is the case [~djp] mentioned. Please correct me if I get wrong.

{code}
  @Private
  public static class RMFatalEventDispatcher
  implements EventHandlerRMFatalEvent {
@Override
public void handle(RMFatalEvent event) {
  LOG.fatal(Received a  + RMFatalEvent.class.getName() +  of type  +
  event.getType().name() + . Cause:\n + event.getCause());

  if (event.getType() == RMFatalEventType.STATE_STORE_FENCED) {
LOG.info(RMStateStore has been fenced);
if (rmContext.isHAEnabled()) {
  try {
// Transition to standby and reinit active services
LOG.info(Transitioning RM to Standby mode);
rm.transitionToStandby(true);
return;
  } catch (Exception e) {
LOG.fatal(Failed to transition RM to Standby mode.);
  }
}
  }

  ExitUtil.terminate(1, event.getCause());
}
  }
{code}



 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Priority: Critical
  Labels: ha

 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2014-05-05 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990050#comment-13990050
 ] 

Tsuyoshi OZAWA commented on YARN-2019:
--

This means that all RM can terminates when ZK cannot be accessed from RMs. If 
we should retry until ZK come up, one solution is handling 
STATE_STORE_OP_FAILED in RMFatalEventDispatcher and going into standby state. 
Please see an attached patch .

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Priority: Critical
  Labels: ha
 Attachments: YARN-2019.1-wip.patch


 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2014-05-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989059#comment-13989059
 ] 

Bikas Saha commented on YARN-2019:
--

That was the initial code since there was no HA states then. With HA states, 
the RM should move into standby state upon store error.

 Retrospect on decision of making RM crashed if any exception throw in 
 ZKRMStateStore
 

 Key: YARN-2019
 URL: https://issues.apache.org/jira/browse/YARN-2019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Priority: Critical
  Labels: ha

 Currently, if any abnormal happens in ZKRMStateStore, it will throw a fetal 
 exception to crash RM down. As shown in YARN-1924, it could due to RM HA 
 internal bug itself, but not fatal exception. We should retrospect some 
 decision here as HA feature is designed to protect key component but not 
 disturb it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)