[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735987#comment-14735987 ] Xianyin Xin commented on YARN-4133: --- Of course we can also address these problems one by one in different jiras. If you like this, just kindly ignore the above comment. > Containers to be preempted leaks in FairScheduler preemption logic. > --- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leaks in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}. We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} > Also once the containers in {{warnedContainers}} are wrongly removed, it will > never be preempted. Because these containers are already in > {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't > return the containers in {{FSAppAttempt#preemptionMap}}. > {code} > public RMContainer preemptContainer() { > if (LOG.isDebugEnabled()) { > LOG.debug("App " + getName() + " is going to preempt a running " + > "container"); > } > RMContainer toBePreempted = null; > for (RMContainer container : getLiveContainers()) { > if (!getPreemptionContainers().contains(container) && > (toBePreempted == null || > comparator.compare(toBePreempted, container) > 0)) { > toBePreempted = container; > } > } > return toBePreempted; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735959#comment-14735959 ] Xianyin Xin commented on YARN-4133: --- Hi [~zxu], it seems the current preemption logic has many problems. I just updated one in [https://issues.apache.org/jira/browse/YARN-4120?focusedCommentId=14735952=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14735952]. I think a logic refactor is need, what do you think? > Containers to be preempted leaks in FairScheduler preemption logic. > --- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leaks in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}, We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736052#comment-14736052 ] Hadoop QA commented on YARN-4133: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 41s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 25s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 54m 10s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 6s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754780/YARN-4133.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d9c1fab | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9049/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9049/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9049/console | This message was automatically generated. > Containers to be preempted leaks in FairScheduler preemption logic. > --- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leaks in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}. We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} > Also once the containers in {{warnedContainers}} are wrongly removed, it will > never be preempted. Because these containers are already in > {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't > return the containers in {{FSAppAttempt#preemptionMap}}. > {code} > public RMContainer preemptContainer() { > if (LOG.isDebugEnabled()) { > LOG.debug("App " + getName() + " is going to preempt a running " + > "container"); > } > RMContainer toBePreempted = null; > for (RMContainer container : getLiveContainers()) { > if
[jira] [Commented] (YARN-4133) Containers to be preempted leaks in FairScheduler preemption logic.
[ https://issues.apache.org/jira/browse/YARN-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736177#comment-14736177 ] Hadoop QA commented on YARN-4133: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 57s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 7s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 20s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 54s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 54m 34s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 94m 53s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754810/YARN-4133.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a153b96 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9052/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9052/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9052/console | This message was automatically generated. > Containers to be preempted leaks in FairScheduler preemption logic. > --- > > Key: YARN-4133 > URL: https://issues.apache.org/jira/browse/YARN-4133 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-4133.000.patch > > > Containers to be preempted leaks in FairScheduler preemption logic. It may > cause missing preemption due to containers in {{warnedContainers}} wrongly > removed. The problem is in {{preemptResources}}: > There are two issues which can cause containers wrongly removed from > {{warnedContainers}}: > Firstly missing the container state {{RMContainerState.ACQUIRED}} in the > condition check: > {code} > (container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) > {code} > Secondly if {{isResourceGreaterThanNone(toPreempt)}} return false, we > shouldn't remove container from {{warnedContainers}}. We should only remove > container from {{warnedContainers}}, if container is not in state > {{RMContainerState.RUNNING}}, {{RMContainerState.ALLOCATED}} and > {{RMContainerState.ACQUIRED}}. > {code} > if ((container.getState() == RMContainerState.RUNNING || > container.getState() == RMContainerState.ALLOCATED) && > isResourceGreaterThanNone(toPreempt)) { > warnOrKillContainer(container); > Resources.subtractFrom(toPreempt, > container.getContainer().getResource()); > } else { > warnedIter.remove(); > } > {code} > Also once the containers in {{warnedContainers}} are wrongly removed, it will > never be preempted. Because these containers are already in > {{FSAppAttempt#preemptionMap}} and {{FSAppAttempt#preemptContainer}} won't > return the containers in {{FSAppAttempt#preemptionMap}}. > {code} > public RMContainer preemptContainer() { > if (LOG.isDebugEnabled()) { > LOG.debug("App " + getName() + " is going to preempt a running " + > "container"); > } > RMContainer toBePreempted = null; > for (RMContainer container : getLiveContainers()) { > if (!getPreemptionContainers().contains(container) && > (toBePreempted == null || > comparator.compare(toBePreempted, container) > 0)) { >