Tao Jie created YARN-4477: ----------------------------- Summary: FairScheduler: encounter infinite loop in attemptScheduling Key: YARN-4477 URL: https://issues.apache.org/jira/browse/YARN-4477 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Tao Jie
This problem is introduced by YARN-4270 which add limitation on reservation. In FSAppAttempt.reserve(): {code} if (!reservationExceedsThreshold(node, type)) { LOG.info("Making reservation: node=" + node.getNodeName() + " app_id=" + getApplicationId()); if (!alreadyReserved) { getMetrics().reserveResource(getUser(), container.getResource()); RMContainer rmContainer = super.reserve(node, priority, null, container); node.reserveResource(this, priority, rmContainer); setReservation(node); } else { RMContainer rmContainer = node.getReservedContainer(); super.reserve(node, priority, rmContainer, container); node.reserveResource(this, priority, rmContainer); setReservation(node); } } {code} If reservation over threshod, current node will not set reservation. But in attemptScheduling in FairSheduler: {code} while (node.getReservedContainer() == null) { boolean assignedContainer = false; if (!queueMgr.getRootQueue().assignContainer(node).equals( Resources.none())) { assignedContainers++; assignedContainer = true; } if (!assignedContainer) { break; } if (!assignMultiple) { break; } if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } } {code} assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not equals to Resources.none(). As a result, if multiple assign is enabled and maxAssign is unlimited, this while loop would never break. I suppose that assignContainer(node) should return Resource.none rather than CONTAINER_RESERVED when the attempt doesn't take the reservation because of the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)