[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157311#comment-14157311 ]
Hudson commented on YARN-2628: ------------------------------ SUCCESS: Integrated in Hadoop-trunk-Commit #6183 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6183/]) YARN-2628. Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free. Contributed by Varun Vasudev (jianhe: rev 054f28552687e9b9859c0126e16a2066e20ead3f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > ----------------------------------------------------------------------------------------------------- > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 2.5.1 > Reporter: Varun Vasudev > Assignee: Varun Vasudev > Fix For: 2.6.0 > > Attachments: apache-yarn-2628.0.patch, apache-yarn-2628.1.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)