[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596867#comment-14596867
 ] 

Hadoop QA commented on YARN-3842:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 16s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 37s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 28s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 47s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m 27s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  49m 43s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741154/YARN-3842.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 077250d |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8317/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8317/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8317/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8317/console |


This message was automatically generated.

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch, YARN-3842.002.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596776#comment-14596776
 ] 

Jian He commented on YARN-3842:
---

I think the latest patch is safe for 2.7.1,  +1

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch, YARN-3842.002.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596783#comment-14596783
 ] 

Karthik Kambatla commented on YARN-3842:


+1, pending Jenkins. 

Thanks for your review, [~jianhe]. I ll go ahead commit this if Jenkins is fine 
with it. 

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch, YARN-3842.002.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596733#comment-14596733
 ] 

Hadoop QA commented on YARN-3842:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 30s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 47s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   6m  5s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  49m 40s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12741131/YARN-3842.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 445b132 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8314/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8314/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8314/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8314/console |


This message was automatically generated.

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596756#comment-14596756
 ] 

Karthik Kambatla commented on YARN-3842:


Thanks for the quick turnaround on this, Robert. 

One nit-pick on the test: would the following be more concise? 

{code}
if (retryCount  5) {
  retryCount++;
  if (isExpectingNMNotYetReadyException) {
containerManager.setBlockNewContainerRequests(true);
  } else {
throw new java.net.ConnectException(start container exception);
  }
} else {
  containerManager.setBlockNewContainerRequests(false);
}
return super.startContainers(requests);
{code}

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3842) NM restarts could lead to app failures

2015-06-22 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596765#comment-14596765
 ] 

Robert Kanter commented on YARN-3842:
-

I had sort of just split {{startContainers}} into two sections (one for each 
part of the test), but this is a lot more concise.  I'll do that.

 NM restarts could lead to app failures
 --

 Key: YARN-3842
 URL: https://issues.apache.org/jira/browse/YARN-3842
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
Priority: Critical
 Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
 YARN-3842.001.patch


 Consider the following scenario:
 1. RM assigns a container on node N to an app A.
 2. Node N is restarted
 3. A tries to launch container on node N.
 3 could lead to an NMNotYetReadyException depending on whether NM N has 
 registered with the RM. In MR, this is considered a task attempt failure. A 
 few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)