Adam Antal created YARN-10243:
---------------------------------

             Summary: Rack-only localization constraint for MR AM is broken for 
CapacityScheduler
                 Key: YARN-10243
                 URL: https://issues.apache.org/jira/browse/YARN-10243
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacity scheduler, capacityscheduler
    Affects Versions: 3.2.0
            Reporter: Adam Antal


Reproduction: Start a MR sleep job with strict-locality configured for AM 
({{-Dmapreduce.job.am.strict-locality=/rack1}} for instance). If 
CapacityScheduler is used, the job will hang (stuck in SCHEDULED state). 

Root cause: if there are no other resources requested (like node locality or 
other constraint), the scheduling opportunities counter will not be incremented 
and the following piece of code always returns false (so we always skip this 
constraint) resulting in an infinite loop:
{code:java}
    // If we are here, we do need containers on this rack for RACK_LOCAL req
    if (type == NodeType.RACK_LOCAL) {
      // 'Delay' rack-local just a little bit...
      long missedOpportunities =
          application.getSchedulingOpportunities(schedulerKey);
      return getActualNodeLocalityDelay() < missedOpportunities;
    }
{code}

Workaround: set {{yarn.scheduler.capacity.node-locality-delay}} to zero to 
enforce this rule to be processed immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to