[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969519#comment-14969519
 ] 

Xuan Gong commented on YARN-4243:
-

Thanks for the comments, [~kasha]

bq. To be consistent with the other config, can we call use zk-retries instead 
of zk.op.retries

Modified

bq. should we make the change to ActiveStandbyElector as a Common JIRA or at 
least create a Common JIRA and close it as part of this one, so the common and 
HDFS devs are aware of this change? They might want to update the way HDFS 
handles the retries situation as well.

Created https://issues.apache.org/jira/browse/HADOOP-12503, and link the ticket

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969804#comment-14969804
 ] 

Xuan Gong commented on YARN-4243:
-

Testcase failures and findbug warnings are not related

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969576#comment-14969576
 ] 

Junping Du commented on YARN-4243:
--

v5 patch LGTM. +1 pending on Jenkins result.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969753#comment-14969753
 ] 

Hadoop QA commented on YARN-4243:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  22m 16s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 12s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 27s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 44s | The applied patch generated  1 
new checkstyle issues (total was 15, now 16). |
| {color:red}-1{color} | checkstyle |   3m 17s | The applied patch generated  2 
new checkstyle issues (total was 211, now 212). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   7m 33s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   2m  0s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  68m 50s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 133m 22s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.logaggregation.TestAggregatedLogsBlock |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768079/YARN-4243.5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4c0bae2 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/diffcheckstylehadoop-common.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9526/console |


This message was automatically generated.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969841#comment-14969841
 ] 

Junping Du commented on YARN-4243:
--

Committing it now.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969982#comment-14969982
 ] 

Hudson commented on YARN-4243:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8692 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8692/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970179#comment-14970179
 ] 

Hudson commented on YARN-4243:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2517 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2517/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970269#comment-14970269
 ] 

Hudson commented on YARN-4243:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #528 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/528/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970148#comment-14970148
 ] 

Hudson commented on YARN-4243:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1307 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1307/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* hadoop-yarn-project/CHANGES.txt


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970121#comment-14970121
 ] 

Hudson commented on YARN-4243:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #571 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/571/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970197#comment-14970197
 ] 

Hudson commented on YARN-4243:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #586 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/586/])
YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 
0fce5f9a496925f0d53ea6c14318c9b513de9882)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-21 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967324#comment-14967324
 ] 

Junping Du commented on YARN-4243:
--

+1. I will commit latest patch soon if no further comments from community.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968343#comment-14968343
 ] 

Karthik Kambatla commented on YARN-4243:


Thanks for the update, Xuan. Sorry for the delay in getting to this. 

Just one nit: To be consistent with the other config, can we call use 
zk-retries instead of zk.op.retries? I am +1 otherwise.

One other thing to consider - should we make the change to ActiveStandbyElector 
as a Common JIRA or at least create a Common JIRA and close it as part of this 
one, so the common and HDFS devs are aware of this change? They might want to 
update the way HDFS handles the retries situation as well. 

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-20 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964618#comment-14964618
 ] 

Xuan Gong commented on YARN-4243:
-

[~kasha] Any comments for the latest patch ?

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-16 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961006#comment-14961006
 ] 

Xuan Gong commented on YARN-4243:
-

[~kasha] Do we have any other comments ?

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-16 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961124#comment-14961124
 ] 

Rohith Sharma K S commented on YARN-4243:
-

I am +1 for the latest patch. The patch also keeps old behavior and gives 
options for the users to fail fast.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958972#comment-14958972
 ] 

Karthik Kambatla commented on YARN-4243:


Looking. 

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958994#comment-14958994
 ] 

Karthik Kambatla commented on YARN-4243:


Looks like we are addressing two issues here:
# Have createConnection() retry connecting to ZK. 
## I am with Rohith on this one - I think changing ActiveStandbyElector 
constructor either to use reestablishConnection or otherwise seems like the 
right approach. Do we know why the HDFS devs don't want connections to be 
retried on init, but are fine with it on reestablishConnection?
# Add a config to be able to set a different number of retries for Yarn. 
## Sounds reasonable. Code comments - can we do the following instead:
{code}
int maxRetryNum = 
conf.getInt(YarnConfiguration.RM_HA_FC_ELECTOR_ZK_OP_RETRIES_KEY,
 
conf.getInt(CommonConfigurationKeys.HA_FC_ELECTOR_ZK_OP_RETRIES_KEY,
   
CommonConfigurationKeys.HA_FC_ELECTOR_ZK_OP_RETRIES_DEFAULT));
{code}


> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958998#comment-14958998
 ] 

Karthik Kambatla commented on YARN-4243:


Oh, and Rohith's comment about this patch leading to retrying {{(numRetries * 
numRetries)}} times on {{reestablishConnection}} is a concern too. 

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959585#comment-14959585
 ] 

Hadoop QA commented on YARN-4243:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m  1s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 35s | The applied patch generated  1 
new checkstyle issues (total was 15, now 16). |
| {color:red}-1{color} | checkstyle |   3m  9s | The applied patch generated  2 
new checkstyle issues (total was 211, now 212). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   6m 52s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  57m  5s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 119m 52s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766837/YARN-4243.4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 8d2d3eb |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/diffcheckstylehadoop-common.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9460/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9460/console |


This message was automatically generated.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959335#comment-14959335
 ] 

Xuan Gong commented on YARN-4243:
-

Thanks for [~kasha] 's suggestion.
Upload a new patch which define a new constructor in ActiveStandbyElector which 
add a new parameter: failfast.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957530#comment-14957530
 ] 

Hadoop QA commented on YARN-4243:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 20s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m  8s | The applied patch generated  2 
new checkstyle issues (total was 211, now 212). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 39s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   8m 24s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  3s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  57m 23s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 121m 33s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766368/YARN-4243.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0d77e85 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9445/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9445/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9445/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9445/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9445/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9445/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9445/console |


This message was automatically generated.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-14 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957903#comment-14957903
 ] 

Xuan Gong commented on YARN-4243:
-

[~kasha] Could you review the latest patch, please ?

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, 
> YARN-4243.2.patch, YARN-4243.3.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953332#comment-14953332
 ] 

Xuan Gong commented on YARN-4243:
-

[~rohithsharma] Thanks for the review.
bq. And method reEstablishSession() can be reused rather duplicating same logic 
over embedded electors. Instead of overriding createConnection() method, 
reEstablishSession() method can be used in ActiveStandByElector constructor.I'd 
prefer to make change in hadoop-common rather in embedded elector service.

This will affect the HDFS ZKFS, and they do not want the retry on 
initialization.

bq. While initializing Elector service createConnection will retry as per 
configured value i.e maxRetryNum say 10. But if session is closed and 
reestablished then number of retry count will be maxRetryNum * maxRetryNum i.e 
10*10=100 times.

I am not sure if I understand correctly. If we set the maxRetryNum as 10, and 
zk connect itself would do some retries (10times), the total is 10*10.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953346#comment-14953346
 ] 

Xuan Gong commented on YARN-4243:
-

Thanks for the review, [~djp]
Will create a new patch to address your comments

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954010#comment-14954010
 ] 

Hadoop QA commented on YARN-4243:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m  9s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 16s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 33s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 20s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 13s | The applied patch generated  2 
new checkstyle issues (total was 211, now 212). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 38s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   7m 25s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  2s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  62m 25s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 126m 21s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766161/YARN-4243.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 9849c8b |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9412/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9412/console |


This message was automatically generated.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954300#comment-14954300
 ] 

Hadoop QA commented on YARN-4243:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  24m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   9m 11s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 33s | The applied patch generated  4 
new checkstyle issues (total was 211, now 214). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 46s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |   8m 12s | Tests failed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 28s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m 12s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   0m 21s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  71m 26s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.metrics2.impl.TestMetricsSystemImpl |
| Failed build | hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12766207/YARN-4243.2.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c60a16f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9415/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9415/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9415/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9415/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9415/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9415/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9415/console |


This message was automatically generated.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, YARN-4243.2.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-12 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954396#comment-14954396
 ] 

Rohith Sharma K S commented on YARN-4243:
-

bq. This will affect the HDFS ZKFS, and they do not want the retry on 
initialization.
Given default ability to retry on initialization do not require by ZKFS, then 
is fine.

bq. If we set the maxRetryNum as 10, and *zk connect itself would do some 
retries (10times)*, the total is 10*10.
Sorry I did not get it. Could you explain bit more.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, YARN-4243.2.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-09 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950630#comment-14950630
 ] 

Rohith Sharma K S commented on YARN-4243:
-

Thanks [~xgong] for working on this.
Some comments and suggestions
# While initializing Elector service createConnection will retry as per 
configured value i.e *maxRetryNum* say 10. But if session is closed and 
reestablished then number of retry count will be  *maxRetryNum* * *maxRetryNum* 
i.e 10*10=100 times.
# And method {{reEstablishSession()}} can be reused rather duplicating same 
logic over embedded electors.  Instead of overriding createConnection() method, 
reEstablishSession() method can be used in ActiveStandByElector constructor.I'd 
prefer to make change in hadoop-common rather in embedded elector service. 

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950642#comment-14950642
 ] 

Junping Du commented on YARN-4243:
--

Thanks for reporting the issue and delivering the patch, [~xgong]! 
The patch make sense in overall. Some minor comments:
1. I think we are adding a new configuration here, and we may want to add it to 
yarn-default.xml as well. It is only for documentation purpose and we don't 
have to specify default value though.
2. Do we need to add another configuration for sleep interval during retry? 
hard coded with 5 seconds sounds lack of flexibility.
3. If connection still get failed after max retry times, shall we put retry 
times in error messages as well? like: "Can not establish Zookeeper 
Connection... after retry x times").

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950960#comment-14950960
 ] 

Karthik Kambatla commented on YARN-4243:


I would like to review the patch before commit. 

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950990#comment-14950990
 ] 

Junping Du commented on YARN-4243:
--

No worry. Nobody want to commit it right now as we all leave concrete 
review/improvement comments.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-08 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949760#comment-14949760
 ] 

Xuan Gong commented on YARN-4243:
-

Override the createConnection() in EmbeddedElectorService to add some retry, 
and create a Yarn Configuration for the maxAttempts because we have shared code 
(ActiveStandbyElector)and related configuration with HDFS ZKFC

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit

2015-10-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949921#comment-14949921
 ] 

Hadoop QA commented on YARN-4243:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  22m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 53s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 22s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m  1s | The applied patch generated  2 
new checkstyle issues (total was 211, now 212). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 52s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  19m 18s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |   0m 24s | Tests failed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |  62m 59s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 138m  8s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
| Timed out tests | org.apache.hadoop.http.TestHttpServerLifecycle |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765723/YARN-4243.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e1bf8b3 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9386/console |


This message was automatically generated.

> Add retry on establishing Zookeeper conenction in 
> EmbeddedElectorService#serviceInit
> 
>
> Key: YARN-4243
> URL: https://issues.apache.org/jira/browse/YARN-4243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4243.1.patch
>
>
> Right now, the RM would shut down if the zk connection is down when the RM do 
> the initialization. We need to add retry on this part



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)