[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088392#comment-15088392 ] Junping Du commented on YARN-4180: -- Thanks [~kasha] for help on this. > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Fix For: 2.7.2, 2.6.4 > > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088346#comment-15088346 ] Karthik Kambatla commented on YARN-4180: Cherry-picked to 2.6.4 as well. Thanks for the ping, Junping. > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Fix For: 2.7.2, 2.6.4 > > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080551#comment-15080551 ] Junping Du commented on YARN-4180: -- Hi [~adhoot] and [~ka...@cloudera.com], as Sangjin's comments above, should this fix be backported to branch-2.6? Thanks! > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15019099#comment-15019099 ] Sangjin Lee commented on YARN-4180: --- Does this issue exist in 2.6.x? Should this be backported to branch-2.6? > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934627#comment-14934627 ] Hudson commented on YARN-4180: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2372 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2372/]) YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) (adhoot: rev 9735afe967a660f356e953348cb6c34417f41055) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934339#comment-14934339 ] Hudson commented on YARN-4180: -- FAILURE: Integrated in Hadoop-trunk-Commit #8535 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8535/]) YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) (adhoot: rev 9735afe967a660f356e953348cb6c34417f41055) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch, YARN-4180.002.patch, > YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934504#comment-14934504 ] Hudson commented on YARN-4180: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1194 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1194/]) YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) (adhoot: rev 9735afe967a660f356e953348cb6c34417f41055) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934506#comment-14934506 ] Hudson commented on YARN-4180: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #431 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/431/]) YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) (adhoot: rev 9735afe967a660f356e953348cb6c34417f41055) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934368#comment-14934368 ] Hudson commented on YARN-4180: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #455 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/455/]) YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) (adhoot: rev 9735afe967a660f356e953348cb6c34417f41055) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch, YARN-4180.002.patch, > YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934453#comment-14934453 ] Hudson commented on YARN-4180: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #462 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/462/]) YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) (adhoot: rev 9735afe967a660f356e953348cb6c34417f41055) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/CHANGES.txt > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934516#comment-14934516 ] Hudson commented on YARN-4180: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2399 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2399/]) YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) (adhoot: rev 9735afe967a660f356e953348cb6c34417f41055) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * hadoop-yarn-project/CHANGES.txt > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907473#comment-14907473 ] Hadoop QA commented on YARN-4180: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 25s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 4s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 24s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 50s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 56m 56s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 97m 40s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12762273/YARN-4180.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d1b9b85 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9262/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9262/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9262/console | This message was automatically generated. > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch, YARN-4180.002.patch, > YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907283#comment-14907283 ] Hadoop QA commented on YARN-4180: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 12s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 31s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 29s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 42s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 35s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 56m 48s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 100m 21s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/1276/YARN-4180.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d1b9b85 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9260/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9260/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9260/console | This message was automatically generated. > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch, YARN-4180.002.patch, > YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903217#comment-14903217 ] Anubhav Dhoot commented on YARN-4180: - The test failure looks unrelated. > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903096#comment-14903096 ] Robert Kanter commented on YARN-4180: - +1 after doing those. > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876268#comment-14876268 ] Hadoop QA commented on YARN-4180: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 54s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 2s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 49s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 54m 26s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 93m 50s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12761162/YARN-4180.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 88d89267 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9215/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9215/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9215/console | This message was automatically generated. > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876583#comment-14876583 ] Robert Kanter commented on YARN-4180: - Looks good. Two minor things: - Can you look into the test failure to see if it's related - Instead of the {{// Exposed for testing}} comment, you can put {{@VisibleForTesting}} > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804619#comment-14804619 ] Anubhav Dhoot commented on YARN-4180: - Propose using retries in the ContainerManagement proxy used by the AMLauncher#getContainerMgrProxy > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)