Tao Jie created YARN-4477:
-----------------------------
Summary: FairScheduler: encounter infinite loop in
attemptScheduling
Key: YARN-4477
URL: https://issues.apache.org/jira/browse/YARN-4477
Project: Hadoop YARN
Issue Type: Bug
Components: fairscheduler
Reporter: Tao Jie
This problem is introduced by YARN-4270 which add limitation on reservation.
In FSAppAttempt.reserve():
{code}
if (!reservationExceedsThreshold(node, type)) {
LOG.info("Making reservation: node=" + node.getNodeName() +
" app_id=" + getApplicationId());
if (!alreadyReserved) {
getMetrics().reserveResource(getUser(), container.getResource());
RMContainer rmContainer =
super.reserve(node, priority, null, container);
node.reserveResource(this, priority, rmContainer);
setReservation(node);
} else {
RMContainer rmContainer = node.getReservedContainer();
super.reserve(node, priority, rmContainer, container);
node.reserveResource(this, priority, rmContainer);
setReservation(node);
}
}
{code}
If reservation over threshod, current node will not set reservation.
But in attemptScheduling in FairSheduler:
{code}
while (node.getReservedContainer() == null) {
boolean assignedContainer = false;
if (!queueMgr.getRootQueue().assignContainer(node).equals(
Resources.none())) {
assignedContainers++;
assignedContainer = true;
}
if (!assignedContainer) { break; }
if (!assignMultiple) { break; }
if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
}
{code}
assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
equals to Resources.none().
As a result, if multiple assign is enabled and maxAssign is unlimited, this
while loop would never break.
I suppose that assignContainer(node) should return Resource.none rather than
CONTAINER_RESERVED when the attempt doesn't take the reservation because of the
limitation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)