[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507255#comment-14507255 ] sandflee commented on YARN-3387: It seems a bug in LaunchAM in MockRM.java, in LaunchAM: 1, wait App becomes ACCEPTED, after this appAttempt is created 2, node Heart beat 3, wait appAttempt becomes ALLOCATED If nodeHeartBeat is handled before appAttempt becomes SCHEDULED, appAttempt State will never comes to ALLOCATED if no other nm heartbeat comes. just as the failed case https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testShouldNotCountFailureToMaxAttemptRetry/ https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testPreemptedAMRestartOnRMRestart/ container complete message couldn't pass to am if am restarted and rm changed - Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: sandflee Priority: Critical Labels: patch Attachments: YARN-3387.001.patch, YARN-3387.002.patch suppose am work preserving and rm ha is enabled. container complete message is passed to appattemt.justFinishedContainers in rm。in normal situation,all attempt in one app shares the same justFinishedContainers, but when rm changed, every attempt has it's own justFinishedContainers, so in situations below, container complete message couldn't passed to am: 1, am restart 2, rm changes 3, container launched by first am completes container complete message will be passed to appAttempt1 not appAttempt2, but am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508167#comment-14508167 ] Anubhav Dhoot commented on YARN-3387: - Thanks [~sandflee] for reporting the issue. I have opened YARN-3533 to fix this. container complete message couldn't pass to am if am restarted and rm changed - Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: sandflee Priority: Critical Labels: patch Attachments: YARN-3387.001.patch, YARN-3387.002.patch suppose am work preserving and rm ha is enabled. container complete message is passed to appattemt.justFinishedContainers in rm。in normal situation,all attempt in one app shares the same justFinishedContainers, but when rm changed, every attempt has it's own justFinishedContainers, so in situations below, container complete message couldn't passed to am: 1, am restart 2, rm changes 3, container launched by first am completes container complete message will be passed to appAttempt1 not appAttempt2, but am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504109#comment-14504109 ] Hadoop QA commented on YARN-3387: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726708/YARN-3387.002.patch against trunk revision c92f6f3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7410//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7410//console This message is automatically generated. container complete message couldn't pass to am if am restarted and rm changed - Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: sandflee Priority: Critical Labels: patch Attachments: YARN-3387.001.patch, YARN-3387.002.patch suppose am work preserving and rm ha is enabled. container complete message is passed to appattemt.justFinishedContainers in rm。in normal situation,all attempt in one app shares the same justFinishedContainers, but when rm changed, every attempt has it's own justFinishedContainers, so in situations below, container complete message couldn't passed to am: 1, am restart 2, rm changes 3, container launched by first am completes container complete message will be passed to appAttempt1 not appAttempt2, but am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491469#comment-14491469 ] sandflee commented on YARN-3387: Jian He, thanks for the reiew. Yes, they're same right now, and I'll add a test case later container complete message couldn't pass to am if am restarted and rm changed - Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: sandflee Priority: Critical Labels: patch Attachments: YARN-3387.001.patch suppose am work preserving and rm ha is enabled. container complete message is passed to appattemt.justFinishedContainers in rm。in normal situation,all attempt in one app shares the same justFinishedContainers, but when rm changed, every attempt has it's own justFinishedContainers, so in situations below, container complete message couldn't passed to am: 1, am restart 2, rm changes 3, container launched by first am completes container complete message will be passed to appAttempt1 not appAttempt2, but am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490484#comment-14490484 ] Jian He commented on YARN-3387: --- [~sandflee], thanks for the patch ! The newly added shareStateWithCurrentAttempt is the same as transferStateFromPreviousAttempt ? we can just use the latter and may rename it to transferStateFromAttempt. Could you add a test case too? TestWorkPreservingRMRestart has some example tests. container complete message couldn't pass to am if am restarted and rm changed - Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: sandflee Priority: Critical Labels: patch Attachments: YARN-3387.001.patch suppose am work preserving and rm ha is enabled. container complete message is passed to appattemt.justFinishedContainers in rm。in normal situation,all attempt in one app shares the same justFinishedContainers, but when rm changed, every attempt has it's own justFinishedContainers, so in situations below, container complete message couldn't passed to am: 1, am restart 2, rm changes 3, container launched by first am completes container complete message will be passed to appAttempt1 not appAttempt2, but am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387934#comment-14387934 ] Hadoop QA commented on YARN-3387: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707348/YARN-3387.001.patch against trunk revision 1a495fb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7160//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7160//console This message is automatically generated. container complete message couldn't pass to am if am restarted and rm changed - Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: sandflee Priority: Critical Labels: patch Attachments: YARN-3387.001.patch suppose am work preserving and rm ha is enabled. container complete message is passed to appattemt.justFinishedContainers in rm。in normal situation,all attempt in one app shares the same justFinishedContainers, but when rm changed, every attempt has it's own justFinishedContainers, so in situations below, container complete message couldn't passed to am: 1, am restart 2, rm changes 3, container launched by first am completes container complete message will be passed to appAttempt1 not appAttempt2, but am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376640#comment-14376640 ] Karthik Kambatla commented on YARN-3387: Does this imply our work-preserving AM restart is broken on a RM failover? container complete message couldn't pass to am if am restarted and rm changed - Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: sandflee Priority: Critical suppose am work preserving and rm ha is enabled. container complete message is passed to appattemt.justFinishedContainers in rm。in normal situation,all attempt in one app shares the same justFinishedContainers, but when rm changed, every attempt has it's own justFinishedContainers, so in situations below, container complete message couldn't passed to am: 1, am restart 2, rm changes 3, container launched by first am completes container complete message will be passed to appAttempt1 not appAttempt2, but am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377019#comment-14377019 ] sandflee commented on YARN-3387: yes container complete message couldn't pass to am if am restarted and rm changed - Key: YARN-3387 URL: https://issues.apache.org/jira/browse/YARN-3387 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: sandflee Priority: Critical suppose am work preserving and rm ha is enabled. container complete message is passed to appattemt.justFinishedContainers in rm。in normal situation,all attempt in one app shares the same justFinishedContainers, but when rm changed, every attempt has it's own justFinishedContainers, so in situations below, container complete message couldn't passed to am: 1, am restart 2, rm changes 3, container launched by first am completes container complete message will be passed to appAttempt1 not appAttempt2, but am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)