[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203391#comment-14203391 ] Hudson commented on YARN-2825: -- FAILURE: Integrated in Hadoop-Yarn-trunk #737 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/737/]) YARN-2825. Container leak on NM. Contributed by Jian He (jlowe: rev c3d475070a1ec54c4b05923f4782cef204effd2c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Fix For: 2.6.0 Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch, YARN-2825.3.patch, YARN-2825.4.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203436#comment-14203436 ] Hudson commented on YARN-2825: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1927 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1927/]) YARN-2825. Container leak on NM. Contributed by Jian He (jlowe: rev c3d475070a1ec54c4b05923f4782cef204effd2c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/CHANGES.txt Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Fix For: 2.6.0 Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch, YARN-2825.3.patch, YARN-2825.4.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202287#comment-14202287 ] Jason Lowe commented on YARN-2825: -- Thanks for the patch, Jian! Is there a reason we need to cast to ContainerImpl? I think calling context.getContainers().get(containerId).getContainerState() == org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerState.DONE would be equivalent and cleaner since we wouldn't assume the container implementation. Or we could get the container status and check for COMPLETE which is what other parts of the code are doing. Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202422#comment-14202422 ] Jian He commented on YARN-2825: --- Thanks Jason for reviewing the patch ! bq. Or we could get the container status and check for COMPLETE which is what other parts of the code are doing the existing method is doing {{cloneAndGetContainerStatus}}, I just wanted to avoid the clone part. To conform with the rest of the code and also avoid doing the casting, I exposed the getCurrentState in the interface, as I saw there's another caller needs to do the same. Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202521#comment-14202521 ] Jason Lowe commented on YARN-2825: -- Curious, what's the reasoning to avoid checking for the DONE state? That's utilizing an existing interface rather than adding a new one. It's confusing to have two get state methods (current vs. container, but it _is_ a container so...), although I realize it derives from the undesirable situation of having two ContainerState types. I suppose we could also just expose an isComplete() predicate method since callers are only checking for COMPLETE. That being said I'm not totally against adding the yarn.api.records state in a new method, just think it's not really necessary. If we continue along that route then ContainerImpl.getCurrentState needs the Override decorator. Also ContainerImpl is now an usused import, and should continue to be unnecessary regardless of which route we pursue. Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202530#comment-14202530 ] Jian He commented on YARN-2825: --- bq. what's the reasoning to avoid checking for the DONE state? The only reason is to consistent with the check in {{NodeStatusUpdaterImpl#getContainerStatuses}} where it's checking ContainerState.COMPLETE rather than DONE. i can just change it check DONE instead of adding a new method. updating the patch. Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202578#comment-14202578 ] Hadoop QA commented on YARN-2825: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680218/YARN-2825.2.patch against trunk revision 2ac1be7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1219 javac compiler warnings (more than the trunk's current 153 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 9 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/5782//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestRMNMRPCResponseId The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5782//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5782//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5782//console This message is automatically generated. Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202684#comment-14202684 ] Jian He commented on YARN-2825: --- [~jlowe], mind take a look at the new patch ? Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch, YARN-2825.3.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202765#comment-14202765 ] Hadoop QA commented on YARN-2825: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680267/YARN-2825.3.patch against trunk revision 06b7979. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5792//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5792//console This message is automatically generated. Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch, YARN-2825.3.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202905#comment-14202905 ] Jason Lowe commented on YARN-2825: -- +1 lgtm. Committing this. Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch, YARN-2825.3.patch, YARN-2825.4.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202919#comment-14202919 ] Hudson commented on YARN-2825: -- FAILURE: Integrated in Hadoop-trunk-Commit #6487 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6487/]) YARN-2825. Container leak on NM. Contributed by Jian He (jlowe: rev c3d475070a1ec54c4b05923f4782cef204effd2c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch, YARN-2825.3.patch, YARN-2825.4.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202935#comment-14202935 ] Jian He commented on YARN-2825: --- Jason, thanks for review and committing! Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Fix For: 2.6.0 Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch, YARN-2825.3.patch, YARN-2825.4.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202899#comment-14202899 ] Hadoop QA commented on YARN-2825: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680289/YARN-2825.4.patch against trunk revision 4cfd5bc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5795//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5795//console This message is automatically generated. Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch, YARN-2825.2.patch, YARN-2825.3.patch, YARN-2825.4.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2825) Container leak on NM
[ https://issues.apache.org/jira/browse/YARN-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201530#comment-14201530 ] Hadoop QA commented on YARN-2825: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680068/YARN-2825.1.patch against trunk revision ba0a42c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5769//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5769//console This message is automatically generated. Container leak on NM Key: YARN-2825 URL: https://issues.apache.org/jira/browse/YARN-2825 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Priority: Critical Attachments: YARN-2825.1.patch, YARN-2825.1.patch Caused by YARN-1372. thanks [~vinodkv] for pointing this out. The problem is that in YARN-1372 we changed the behavior to remove containers from NMContext only after the containers are acknowledged by AM. But in the {{NodeStatusUpdaterImpl#removeCompletedContainersFromContext}} call, we didn't check whether the container is really completed or not. If the container is stilll running, we shouldn't remove the container from the context -- This message was sent by Atlassian JIRA (v6.3.4#6332)