[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2016-01-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088392#comment-15088392
 ] 

Junping Du commented on YARN-4180:
--

Thanks [~kasha] for help on this.

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2016-01-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088346#comment-15088346
 ] 

Karthik Kambatla commented on YARN-4180:


Cherry-picked to 2.6.4 as well. Thanks for the ping, Junping. 

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Fix For: 2.7.2, 2.6.4
>
> Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2016-01-03 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15080551#comment-15080551
 ] 

Junping Du commented on YARN-4180:
--

Hi [~adhoot] and [~ka...@cloudera.com], as Sangjin's comments above, should 
this fix be backported to branch-2.6? Thanks!

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-11-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15019099#comment-15019099
 ] 

Sangjin Lee commented on YARN-4180:
---

Does this issue exist in 2.6.x? Should this be backported to branch-2.6?

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Fix For: 2.7.2
>
> Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934627#comment-14934627
 ] 

Hudson commented on YARN-4180:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2372 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2372/])
YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) 
(adhoot: rev 9735afe967a660f356e953348cb6c34417f41055)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934339#comment-14934339
 ] 

Hudson commented on YARN-4180:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8535 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8535/])
YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) 
(adhoot: rev 9735afe967a660f356e953348cb6c34417f41055)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java


> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch, YARN-4180.002.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934504#comment-14934504
 ] 

Hudson commented on YARN-4180:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1194 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1194/])
YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) 
(adhoot: rev 9735afe967a660f356e953348cb6c34417f41055)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java


> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934506#comment-14934506
 ] 

Hudson commented on YARN-4180:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #431 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/431/])
YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) 
(adhoot: rev 9735afe967a660f356e953348cb6c34417f41055)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java


> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934368#comment-14934368
 ] 

Hudson commented on YARN-4180:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #455 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/455/])
YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) 
(adhoot: rev 9735afe967a660f356e953348cb6c34417f41055)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java


> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch, YARN-4180.002.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934453#comment-14934453
 ] 

Hudson commented on YARN-4180:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #462 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/462/])
YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) 
(adhoot: rev 9735afe967a660f356e953348cb6c34417f41055)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* hadoop-yarn-project/CHANGES.txt


> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934516#comment-14934516
 ] 

Hudson commented on YARN-4180:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2399 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2399/])
YARN-4180. AMLauncher does not retry on failures when talking to NM. (adhoot) 
(adhoot: rev 9735afe967a660f356e953348cb6c34417f41055)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* hadoop-yarn-project/CHANGES.txt


> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907473#comment-14907473
 ] 

Hadoop QA commented on YARN-4180:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 25s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  4s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 24s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  56m 56s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  97m 40s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762273/YARN-4180.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d1b9b85 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9262/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9262/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9262/console |


This message was automatically generated.

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch, YARN-4180.002.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907283#comment-14907283
 ] 

Hadoop QA commented on YARN-4180:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 12s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 31s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 29s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 42s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  56m 48s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 100m 21s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/1276/YARN-4180.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / d1b9b85 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9260/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9260/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9260/console |


This message was automatically generated.

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch, YARN-4180.002.patch, 
> YARN-4180.002.patch, YARN-4180.002.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-22 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903217#comment-14903217
 ] 

Anubhav Dhoot commented on YARN-4180:
-

The test failure looks unrelated. 

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-22 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903096#comment-14903096
 ] 

Robert Kanter commented on YARN-4180:
-

+1 after doing those.

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876268#comment-14876268
 ] 

Hadoop QA commented on YARN-4180:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 54s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 49s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  54m 26s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761162/YARN-4180.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 88d89267 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9215/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9215/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9215/console |


This message was automatically generated.

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-18 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876583#comment-14876583
 ] 

Robert Kanter commented on YARN-4180:
-

Looks good.  Two minor things:
- Can you look into the test failure to see if it's related
- Instead of the {{// Exposed for testing}} comment, you can put 
{{@VisibleForTesting}}


> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
> Attachments: YARN-4180.001.patch
>
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4180) AMLauncher does not retry on failures when talking to NM

2015-09-17 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804619#comment-14804619
 ] 

Anubhav Dhoot commented on YARN-4180:
-

Propose using retries in the ContainerManagement proxy used by the 
AMLauncher#getContainerMgrProxy

> AMLauncher does not retry on failures when talking to NM 
> -
>
> Key: YARN-4180
> URL: https://issues.apache.org/jira/browse/YARN-4180
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> We see issues with RM trying to launch a container while a NM is restarting 
> and we get exceptions like NMNotReadyException. While YARN-3842 added retry 
> for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing 
> there intermittent errors to cause job failures. This can manifest during 
> rolling restart of NMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)