[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596867#comment-14596867 ] Hadoop QA commented on YARN-3842: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 16s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 27s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 49m 43s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741154/YARN-3842.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 077250d | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8317/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8317/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8317/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8317/console | This message was automatically generated. > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch, YARN-3842.002.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596783#comment-14596783 ] Karthik Kambatla commented on YARN-3842: +1, pending Jenkins. Thanks for your review, [~jianhe]. I ll go ahead commit this if Jenkins is fine with it. > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch, YARN-3842.002.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596776#comment-14596776 ] Jian He commented on YARN-3842: --- I think the latest patch is safe for 2.7.1, +1 > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch, YARN-3842.002.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596765#comment-14596765 ] Robert Kanter commented on YARN-3842: - I had sort of just split {{startContainers}} into two sections (one for each part of the test), but this is a lot more concise. I'll do that. > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596756#comment-14596756 ] Karthik Kambatla commented on YARN-3842: Thanks for the quick turnaround on this, Robert. One nit-pick on the test: would the following be more concise? {code} if (retryCount < 5) { retryCount++; if (isExpectingNMNotYetReadyException) { containerManager.setBlockNewContainerRequests(true); } else { throw new java.net.ConnectException("start container exception"); } } else { containerManager.setBlockNewContainerRequests(false); } return super.startContainers(requests); {code} > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3842) NM restarts could lead to app failures
[ https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596733#comment-14596733 ] Hadoop QA commented on YARN-3842: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 20s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 30s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 5s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 49m 40s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12741131/YARN-3842.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 445b132 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8314/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8314/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8314/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8314/console | This message was automatically generated. > NM restarts could lead to app failures > -- > > Key: YARN-3842 > URL: https://issues.apache.org/jira/browse/YARN-3842 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter >Priority: Critical > Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, > YARN-3842.001.patch > > > Consider the following scenario: > 1. RM assigns a container on node N to an app A. > 2. Node N is restarted > 3. A tries to launch container on node N. > 3 could lead to an NMNotYetReadyException depending on whether NM N has > registered with the RM. In MR, this is considered a task attempt failure. A > few of these could lead to a task/job failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)