[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156349#comment-14156349 ] Hudson commented on YARN-2630: -- FAILURE: Integrated in Hadoop-Yarn-trunk #698 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/698/]) YARN-2630. Prevented previous AM container status from being acquired by the current restarted AM. Contributed by Jian He. (zjshen: rev 52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/CHANGES.txt TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, YARN-2630.4.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156427#comment-14156427 ] Hudson commented on YARN-2630: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/]) YARN-2630. Prevented previous AM container status from being acquired by the current restarted AM. Contributed by Jian He. (zjshen: rev 52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, YARN-2630.4.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156543#comment-14156543 ] Hudson commented on YARN-2630: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/]) YARN-2630. Prevented previous AM container status from being acquired by the current restarted AM. Contributed by Jian He. (zjshen: rev 52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, YARN-2630.4.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155136#comment-14155136 ] Zhijie Shen commented on YARN-2630: --- Make sense. +1 TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-2630.1.patch, YARN-2630.2.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155227#comment-14155227 ] Zhijie Shen commented on YARN-2630: --- Would you please check finishedContainersPulledByAM is completely replaced in the code base? {code} -if (this.finishedContainersPulledByAM != null) { +if (this.containersToBeRemovedFromNM != null) { addFinishedContainersPulledByAMToProto(); } {code} {code} - public void addFinishedContainersPulledByAM( + public void addContainersToBeRemovedFromNM( final ListContainerId finishedContainersPulledByAM) { if (finishedContainersPulledByAM == null) return; initFinishedContainersPulledByAM(); -this.finishedContainersPulledByAM.addAll(finishedContainersPulledByAM); +this.containersToBeRemovedFromNM.addAll(finishedContainersPulledByAM); {code} {code} - nhResponse.addFinishedContainersPulledByAM(finishedContainersPulledByAM); + nhResponse.addContainersToBeRemovedFromNM(finishedContainersPulledByAM); {code} {code} - response.addFinishedContainersPulledByAM( + response.addContainersToBeRemovedFromNM( new ArrayListContainerId(this.finishedContainersPulledByAM)); {code} TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155318#comment-14155318 ] Hadoop QA commented on YARN-2630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672368/YARN-2630.3.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5199//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5199//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5199//console This message is automatically generated. TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, YARN-2630.4.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155352#comment-14155352 ] Hadoop QA commented on YARN-2630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672374/YARN-2630.4.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5201//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5201//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5201//console This message is automatically generated. TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, YARN-2630.4.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155565#comment-14155565 ] Hadoop QA commented on YARN-2630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672374/YARN-2630.4.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5204//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5204//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5204//console This message is automatically generated. TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, YARN-2630.4.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155702#comment-14155702 ] Hudson commented on YARN-2630: -- FAILURE: Integrated in Hadoop-trunk-Commit #6170 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6170/]) YARN-2630. Prevented previous AM container status from being acquired by the current restarted AM. Contributed by Jian He. (zjshen: rev 52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, YARN-2630.4.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154098#comment-14154098 ] Hadoop QA commented on YARN-2630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672165/YARN-2630.1.patch against trunk revision 14d60da. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5189//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5189//console This message is automatically generated. TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-2630.1.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154247#comment-14154247 ] Hadoop QA commented on YARN-2630: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672219/YARN-2630.2.patch against trunk revision 9e9e9cf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5192//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5192//console This message is automatically generated. TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-2630.1.patch, YARN-2630.2.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154388#comment-14154388 ] Zhijie Shen commented on YARN-2630: --- Is it correct to only notify NM when keepContainersAcrossApplicationAttempts is set? Logically no matter we keep the containers across attempts, we should let NM cleanup the cached finished containers, right? It seems that pullJustFinishedContainers doesn't need this check. {code} if (!appAttempt.getSubmissionContext() .getKeepContainersAcrossApplicationAttempts()) { appAttempt.sendFinishedContainersToNM(); } {code} TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-2630.1.patch, YARN-2630.2.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154395#comment-14154395 ] Jian He commented on YARN-2630: --- bq. Is it correct to only notify NM when keepContainersAcrossApplicationAttempts is set? I added this check because in work-preserving AM restart, 2nd AM needs to know about the previous AM's finished containers. So we should not pre-maturely make NM remove the containers, in case RM restarted. TestDistributedShell#testDSRestartWithPreviousRunningContainers fails - Key: YARN-2630 URL: https://issues.apache.org/jira/browse/YARN-2630 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-2630.1.patch, YARN-2630.2.patch The problem is that after YARN-1372, in work-preserving AM restart, the re-launched AM will also receive previously failed AM container. But DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)