Tao Jie created YARN-4477:

             Summary: FairScheduler: encounter infinite loop in 
                 Key: YARN-4477
                 URL: https://issues.apache.org/jira/browse/YARN-4477
             Project: Hadoop YARN
          Issue Type: Bug
          Components: fairscheduler
            Reporter: Tao Jie

This problem is introduced by YARN-4270 which add limitation on reservation.  
In FSAppAttempt.reserve():
if (!reservationExceedsThreshold(node, type)) {
      LOG.info("Making reservation: node=" + node.getNodeName() +
              " app_id=" + getApplicationId());
      if (!alreadyReserved) {
        getMetrics().reserveResource(getUser(), container.getResource());
        RMContainer rmContainer =
                super.reserve(node, priority, null, container);
        node.reserveResource(this, priority, rmContainer);
      } else {
        RMContainer rmContainer = node.getReservedContainer();
        super.reserve(node, priority, rmContainer, container);
        node.reserveResource(this, priority, rmContainer);
If reservation over threshod, current node will not set reservation.
But in attemptScheduling in FairSheduler:
      while (node.getReservedContainer() == null) {
        boolean assignedContainer = false;
        if (!queueMgr.getRootQueue().assignContainer(node).equals(
            Resources.none())) {
          assignedContainer = true;
        if (!assignedContainer) { break; }
        if (!assignMultiple) { break; }
        if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; }
assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
equals to Resources.none().
As a result, if multiple assign is enabled and maxAssign is unlimited, this 
while loop would never break.

I suppose that assignContainer(node) should return Resource.none rather than 
CONTAINER_RESERVED when the attempt doesn't take the reservation because of the 

This message was sent by Atlassian JIRA

Reply via email to