[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969519#comment-14969519 ] Xuan Gong commented on YARN-4243: - Thanks for the comments, [~kasha] bq. To be consistent with the other config, can we call use zk-retries instead of zk.op.retries Modified bq. should we make the change to ActiveStandbyElector as a Common JIRA or at least create a Common JIRA and close it as part of this one, so the common and HDFS devs are aware of this change? They might want to update the way HDFS handles the retries situation as well. Created https://issues.apache.org/jira/browse/HADOOP-12503, and link the ticket > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969804#comment-14969804 ] Xuan Gong commented on YARN-4243: - Testcase failures and findbug warnings are not related > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969576#comment-14969576 ] Junping Du commented on YARN-4243: -- v5 patch LGTM. +1 pending on Jenkins result. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969753#comment-14969753 ] Hadoop QA commented on YARN-4243: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 22m 16s | Pre-patch trunk has 3 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 12s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 27s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 44s | The applied patch generated 1 new checkstyle issues (total was 15, now 16). | | {color:red}-1{color} | checkstyle | 3m 17s | The applied patch generated 2 new checkstyle issues (total was 211, now 212). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 7m 33s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 2m 0s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 68m 50s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 133m 22s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.logaggregation.TestAggregatedLogsBlock | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768079/YARN-4243.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4c0bae2 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/diffcheckstylehadoop-common.txt https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9526/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9526/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9526/console | This message was automatically generated. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969841#comment-14969841 ] Junping Du commented on YARN-4243: -- Committing it now. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969982#comment-14969982 ] Hudson commented on YARN-4243: -- FAILURE: Integrated in Hadoop-trunk-Commit #8692 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8692/]) YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 0fce5f9a496925f0d53ea6c14318c9b513de9882) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970179#comment-14970179 ] Hudson commented on YARN-4243: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2517 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2517/]) YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 0fce5f9a496925f0d53ea6c14318c9b513de9882) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970269#comment-14970269 ] Hudson commented on YARN-4243: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #528 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/528/]) YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 0fce5f9a496925f0d53ea6c14318c9b513de9882) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970148#comment-14970148 ] Hudson commented on YARN-4243: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1307 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1307/]) YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 0fce5f9a496925f0d53ea6c14318c9b513de9882) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * hadoop-yarn-project/CHANGES.txt > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970121#comment-14970121 ] Hudson commented on YARN-4243: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #571 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/571/]) YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 0fce5f9a496925f0d53ea6c14318c9b513de9882) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970197#comment-14970197 ] Hudson commented on YARN-4243: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #586 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/586/]) YARN-4243. Add retry on establishing Zookeeper conenction in (junping_du: rev 0fce5f9a496925f0d53ea6c14318c9b513de9882) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.8.0 > > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch, YARN-4243.5.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967324#comment-14967324 ] Junping Du commented on YARN-4243: -- +1. I will commit latest patch soon if no further comments from community. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968343#comment-14968343 ] Karthik Kambatla commented on YARN-4243: Thanks for the update, Xuan. Sorry for the delay in getting to this. Just one nit: To be consistent with the other config, can we call use zk-retries instead of zk.op.retries? I am +1 otherwise. One other thing to consider - should we make the change to ActiveStandbyElector as a Common JIRA or at least create a Common JIRA and close it as part of this one, so the common and HDFS devs are aware of this change? They might want to update the way HDFS handles the retries situation as well. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964618#comment-14964618 ] Xuan Gong commented on YARN-4243: - [~kasha] Any comments for the latest patch ? > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961006#comment-14961006 ] Xuan Gong commented on YARN-4243: - [~kasha] Do we have any other comments ? > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961124#comment-14961124 ] Rohith Sharma K S commented on YARN-4243: - I am +1 for the latest patch. The patch also keeps old behavior and gives options for the users to fail fast. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958972#comment-14958972 ] Karthik Kambatla commented on YARN-4243: Looking. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958994#comment-14958994 ] Karthik Kambatla commented on YARN-4243: Looks like we are addressing two issues here: # Have createConnection() retry connecting to ZK. ## I am with Rohith on this one - I think changing ActiveStandbyElector constructor either to use reestablishConnection or otherwise seems like the right approach. Do we know why the HDFS devs don't want connections to be retried on init, but are fine with it on reestablishConnection? # Add a config to be able to set a different number of retries for Yarn. ## Sounds reasonable. Code comments - can we do the following instead: {code} int maxRetryNum = conf.getInt(YarnConfiguration.RM_HA_FC_ELECTOR_ZK_OP_RETRIES_KEY, conf.getInt(CommonConfigurationKeys.HA_FC_ELECTOR_ZK_OP_RETRIES_KEY, CommonConfigurationKeys.HA_FC_ELECTOR_ZK_OP_RETRIES_DEFAULT)); {code} > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958998#comment-14958998 ] Karthik Kambatla commented on YARN-4243: Oh, and Rohith's comment about this patch leading to retrying {{(numRetries * numRetries)}} times on {{reestablishConnection}} is a concern too. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959585#comment-14959585 ] Hadoop QA commented on YARN-4243: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 22m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 35s | The applied patch generated 1 new checkstyle issues (total was 15, now 16). | | {color:red}-1{color} | checkstyle | 3m 9s | The applied patch generated 2 new checkstyle issues (total was 211, now 212). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 6m 52s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 57m 5s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 119m 52s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766837/YARN-4243.4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 8d2d3eb | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/diffcheckstylehadoop-common.txt https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9460/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9460/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9460/console | This message was automatically generated. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959335#comment-14959335 ] Xuan Gong commented on YARN-4243: - Thanks for [~kasha] 's suggestion. Upload a new patch which define a new constructor in ActiveStandbyElector which add a new parameter: failfast. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch, YARN-4243.4.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957530#comment-14957530 ] Hadoop QA commented on YARN-4243: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 21m 53s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 20s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 8s | The applied patch generated 2 new checkstyle issues (total was 211, now 212). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 39s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 8m 24s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 3s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 57m 23s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 121m 33s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766368/YARN-4243.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0d77e85 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9445/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9445/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9445/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9445/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9445/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9445/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9445/console | This message was automatically generated. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957903#comment-14957903 ] Xuan Gong commented on YARN-4243: - [~kasha] Could you review the latest patch, please ? > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, > YARN-4243.2.patch, YARN-4243.3.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953332#comment-14953332 ] Xuan Gong commented on YARN-4243: - [~rohithsharma] Thanks for the review. bq. And method reEstablishSession() can be reused rather duplicating same logic over embedded electors. Instead of overriding createConnection() method, reEstablishSession() method can be used in ActiveStandByElector constructor.I'd prefer to make change in hadoop-common rather in embedded elector service. This will affect the HDFS ZKFS, and they do not want the retry on initialization. bq. While initializing Elector service createConnection will retry as per configured value i.e maxRetryNum say 10. But if session is closed and reestablished then number of retry count will be maxRetryNum * maxRetryNum i.e 10*10=100 times. I am not sure if I understand correctly. If we set the maxRetryNum as 10, and zk connect itself would do some retries (10times), the total is 10*10. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953346#comment-14953346 ] Xuan Gong commented on YARN-4243: - Thanks for the review, [~djp] Will create a new patch to address your comments > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954010#comment-14954010 ] Hadoop QA commented on YARN-4243: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 22m 9s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 16s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 33s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 20s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 13s | The applied patch generated 2 new checkstyle issues (total was 211, now 212). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 38s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 7m 25s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 2s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 62m 25s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 126m 21s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766161/YARN-4243.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 9849c8b | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9412/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9412/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9412/console | This message was automatically generated. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954300#comment-14954300 ] Hadoop QA commented on YARN-4243: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 24m 46s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 9m 11s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 33s | The applied patch generated 4 new checkstyle issues (total was 211, now 214). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 46s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 8m 12s | Tests failed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 28s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 12s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 0m 21s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 71m 26s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.metrics2.impl.TestMetricsSystemImpl | | Failed build | hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766207/YARN-4243.2.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c60a16f | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9415/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9415/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9415/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9415/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9415/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9415/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9415/console | This message was automatically generated. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, YARN-4243.2.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954396#comment-14954396 ] Rohith Sharma K S commented on YARN-4243: - bq. This will affect the HDFS ZKFS, and they do not want the retry on initialization. Given default ability to retry on initialization do not require by ZKFS, then is fine. bq. If we set the maxRetryNum as 10, and *zk connect itself would do some retries (10times)*, the total is 10*10. Sorry I did not get it. Could you explain bit more. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch, YARN-4243.2.1.patch, YARN-4243.2.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950630#comment-14950630 ] Rohith Sharma K S commented on YARN-4243: - Thanks [~xgong] for working on this. Some comments and suggestions # While initializing Elector service createConnection will retry as per configured value i.e *maxRetryNum* say 10. But if session is closed and reestablished then number of retry count will be *maxRetryNum* * *maxRetryNum* i.e 10*10=100 times. # And method {{reEstablishSession()}} can be reused rather duplicating same logic over embedded electors. Instead of overriding createConnection() method, reEstablishSession() method can be used in ActiveStandByElector constructor.I'd prefer to make change in hadoop-common rather in embedded elector service. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950642#comment-14950642 ] Junping Du commented on YARN-4243: -- Thanks for reporting the issue and delivering the patch, [~xgong]! The patch make sense in overall. Some minor comments: 1. I think we are adding a new configuration here, and we may want to add it to yarn-default.xml as well. It is only for documentation purpose and we don't have to specify default value though. 2. Do we need to add another configuration for sleep interval during retry? hard coded with 5 seconds sounds lack of flexibility. 3. If connection still get failed after max retry times, shall we put retry times in error messages as well? like: "Can not establish Zookeeper Connection... after retry x times"). > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950960#comment-14950960 ] Karthik Kambatla commented on YARN-4243: I would like to review the patch before commit. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950990#comment-14950990 ] Junping Du commented on YARN-4243: -- No worry. Nobody want to commit it right now as we all leave concrete review/improvement comments. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949760#comment-14949760 ] Xuan Gong commented on YARN-4243: - Override the createConnection() in EmbeddedElectorService to add some retry, and create a Yarn Configuration for the maxAttempts because we have shared code (ActiveStandbyElector)and related configuration with HDFS ZKFC > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4243) Add retry on establishing Zookeeper conenction in EmbeddedElectorService#serviceInit
[ https://issues.apache.org/jira/browse/YARN-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949921#comment-14949921 ] Hadoop QA commented on YARN-4243: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 22m 44s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 53s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 22s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 1s | The applied patch generated 2 new checkstyle issues (total was 211, now 212). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 52s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 36s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 19m 18s | Tests failed in hadoop-common. | | {color:red}-1{color} | yarn tests | 0m 24s | Tests failed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 62m 59s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 138m 8s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | Timed out tests | org.apache.hadoop.http.TestHttpServerLifecycle | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12765723/YARN-4243.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e1bf8b3 | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9386/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9386/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9386/console | This message was automatically generated. > Add retry on establishing Zookeeper conenction in > EmbeddedElectorService#serviceInit > > > Key: YARN-4243 > URL: https://issues.apache.org/jira/browse/YARN-4243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4243.1.patch > > > Right now, the RM would shut down if the zk connection is down when the RM do > the initialization. We need to add retry on this part -- This message was sent by Atlassian JIRA (v6.3.4#6332)