[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-04-22 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507255#comment-14507255
 ] 

sandflee commented on YARN-3387:


It seems a bug in LaunchAM in MockRM.java, in LaunchAM:
1, wait App becomes ACCEPTED, after this appAttempt is created
2, node Heart beat 
3, wait appAttempt becomes ALLOCATED

If nodeHeartBeat is handled before appAttempt becomes SCHEDULED, appAttempt 
State will never comes to ALLOCATED if no other nm heartbeat comes.
just as the failed case 
https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testShouldNotCountFailureToMaxAttemptRetry/
https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testPreemptedAMRestartOnRMRestart/

 container complete message couldn't pass to am if am restarted and rm changed
 -

 Key: YARN-3387
 URL: https://issues.apache.org/jira/browse/YARN-3387
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: sandflee
Priority: Critical
  Labels: patch
 Attachments: YARN-3387.001.patch, YARN-3387.002.patch


 suppose am work preserving and rm ha is enabled.
 container complete message is passed to appattemt.justFinishedContainers in 
 rm。in normal situation,all attempt in one app shares the same 
 justFinishedContainers, but when rm changed, every attempt has it's own 
 justFinishedContainers, so in situations below, container complete message 
 couldn't passed to am:
 1, am restart
 2, rm changes
 3, container launched by first am completes
 container complete message will be passed to appAttempt1 not appAttempt2, but 
 am pull finished containers from appAttempt2 (currentAppAttempt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-04-22 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508167#comment-14508167
 ] 

Anubhav Dhoot commented on YARN-3387:
-

Thanks [~sandflee] for reporting the issue. I have opened YARN-3533 to fix this.

 container complete message couldn't pass to am if am restarted and rm changed
 -

 Key: YARN-3387
 URL: https://issues.apache.org/jira/browse/YARN-3387
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: sandflee
Priority: Critical
  Labels: patch
 Attachments: YARN-3387.001.patch, YARN-3387.002.patch


 suppose am work preserving and rm ha is enabled.
 container complete message is passed to appattemt.justFinishedContainers in 
 rm。in normal situation,all attempt in one app shares the same 
 justFinishedContainers, but when rm changed, every attempt has it's own 
 justFinishedContainers, so in situations below, container complete message 
 couldn't passed to am:
 1, am restart
 2, rm changes
 3, container launched by first am completes
 container complete message will be passed to appAttempt1 not appAttempt2, but 
 am pull finished containers from appAttempt2 (currentAppAttempt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-04-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504109#comment-14504109
 ] 

Hadoop QA commented on YARN-3387:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12726708/YARN-3387.002.patch
  against trunk revision c92f6f3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7410//console

This message is automatically generated.

 container complete message couldn't pass to am if am restarted and rm changed
 -

 Key: YARN-3387
 URL: https://issues.apache.org/jira/browse/YARN-3387
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: sandflee
Priority: Critical
  Labels: patch
 Attachments: YARN-3387.001.patch, YARN-3387.002.patch


 suppose am work preserving and rm ha is enabled.
 container complete message is passed to appattemt.justFinishedContainers in 
 rm。in normal situation,all attempt in one app shares the same 
 justFinishedContainers, but when rm changed, every attempt has it's own 
 justFinishedContainers, so in situations below, container complete message 
 couldn't passed to am:
 1, am restart
 2, rm changes
 3, container launched by first am completes
 container complete message will be passed to appAttempt1 not appAttempt2, but 
 am pull finished containers from appAttempt2 (currentAppAttempt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-04-12 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491469#comment-14491469
 ] 

sandflee commented on YARN-3387:


Jian He, thanks for the reiew.
Yes, they're same right now, and I'll add a test case later

 container complete message couldn't pass to am if am restarted and rm changed
 -

 Key: YARN-3387
 URL: https://issues.apache.org/jira/browse/YARN-3387
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: sandflee
Priority: Critical
  Labels: patch
 Attachments: YARN-3387.001.patch


 suppose am work preserving and rm ha is enabled.
 container complete message is passed to appattemt.justFinishedContainers in 
 rm。in normal situation,all attempt in one app shares the same 
 justFinishedContainers, but when rm changed, every attempt has it's own 
 justFinishedContainers, so in situations below, container complete message 
 couldn't passed to am:
 1, am restart
 2, rm changes
 3, container launched by first am completes
 container complete message will be passed to appAttempt1 not appAttempt2, but 
 am pull finished containers from appAttempt2 (currentAppAttempt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-04-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490484#comment-14490484
 ] 

Jian He commented on YARN-3387:
---

[~sandflee], thanks for the patch !
The newly added shareStateWithCurrentAttempt is the same as 
transferStateFromPreviousAttempt ?  we can just use  the latter and may rename 
it to transferStateFromAttempt. 

Could you add a test case too? TestWorkPreservingRMRestart has some example 
tests. 

 container complete message couldn't pass to am if am restarted and rm changed
 -

 Key: YARN-3387
 URL: https://issues.apache.org/jira/browse/YARN-3387
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: sandflee
Priority: Critical
  Labels: patch
 Attachments: YARN-3387.001.patch


 suppose am work preserving and rm ha is enabled.
 container complete message is passed to appattemt.justFinishedContainers in 
 rm。in normal situation,all attempt in one app shares the same 
 justFinishedContainers, but when rm changed, every attempt has it's own 
 justFinishedContainers, so in situations below, container complete message 
 couldn't passed to am:
 1, am restart
 2, rm changes
 3, container launched by first am completes
 container complete message will be passed to appAttempt1 not appAttempt2, but 
 am pull finished containers from appAttempt2 (currentAppAttempt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-03-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387934#comment-14387934
 ] 

Hadoop QA commented on YARN-3387:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12707348/YARN-3387.001.patch
  against trunk revision 1a495fb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7160//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7160//console

This message is automatically generated.

 container complete message couldn't pass to am if am restarted and rm changed
 -

 Key: YARN-3387
 URL: https://issues.apache.org/jira/browse/YARN-3387
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: sandflee
Priority: Critical
  Labels: patch
 Attachments: YARN-3387.001.patch


 suppose am work preserving and rm ha is enabled.
 container complete message is passed to appattemt.justFinishedContainers in 
 rm。in normal situation,all attempt in one app shares the same 
 justFinishedContainers, but when rm changed, every attempt has it's own 
 justFinishedContainers, so in situations below, container complete message 
 couldn't passed to am:
 1, am restart
 2, rm changes
 3, container launched by first am completes
 container complete message will be passed to appAttempt1 not appAttempt2, but 
 am pull finished containers from appAttempt2 (currentAppAttempt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-03-23 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376640#comment-14376640
 ] 

Karthik Kambatla commented on YARN-3387:


Does this imply our work-preserving AM restart is broken on a RM failover? 

 container complete message couldn't pass to am if am restarted and rm changed
 -

 Key: YARN-3387
 URL: https://issues.apache.org/jira/browse/YARN-3387
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: sandflee
Priority: Critical

 suppose am work preserving and rm ha is enabled.
 container complete message is passed to appattemt.justFinishedContainers in 
 rm。in normal situation,all attempt in one app shares the same 
 justFinishedContainers, but when rm changed, every attempt has it's own 
 justFinishedContainers, so in situations below, container complete message 
 couldn't passed to am:
 1, am restart
 2, rm changes
 3, container launched by first am completes
 container complete message will be passed to appAttempt1 not appAttempt2, but 
 am pull finished containers from appAttempt2 (currentAppAttempt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed

2015-03-23 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377019#comment-14377019
 ] 

sandflee commented on YARN-3387:


yes

 container complete message couldn't pass to am if am restarted and rm changed
 -

 Key: YARN-3387
 URL: https://issues.apache.org/jira/browse/YARN-3387
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: sandflee
Priority: Critical

 suppose am work preserving and rm ha is enabled.
 container complete message is passed to appattemt.justFinishedContainers in 
 rm。in normal situation,all attempt in one app shares the same 
 justFinishedContainers, but when rm changed, every attempt has it's own 
 justFinishedContainers, so in situations below, container complete message 
 couldn't passed to am:
 1, am restart
 2, rm changes
 3, container launched by first am completes
 container complete message will be passed to appAttempt1 not appAttempt2, but 
 am pull finished containers from appAttempt2 (currentAppAttempt)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)