[ https://issues.apache.org/jira/browse/YARN-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shilun Fan updated YARN-10243: ------------------------------ Target Version/s: 3.5.0 (was: 3.4.0) > Rack-only localization constraint for MR AM is broken for CapacityScheduler > --------------------------------------------------------------------------- > > Key: YARN-10243 > URL: https://issues.apache.org/jira/browse/YARN-10243 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler > Affects Versions: 3.2.0 > Reporter: Adam Antal > Assignee: Bilwa S T > Priority: Major > > Reproduction: Start a MR sleep job with strict-locality configured for AM > ({{-Dmapreduce.job.am.strict-locality=/rack1}} for instance). If > CapacityScheduler is used, the job will hang (stuck in SCHEDULED state). > Root cause: if there are no other resources requested (like node locality or > other constraint), the scheduling opportunities counter will not be > incremented and the following piece of code always returns false (so we > always skip this constraint) resulting in an infinite loop: > {code:java} > // If we are here, we do need containers on this rack for RACK_LOCAL req > if (type == NodeType.RACK_LOCAL) { > // 'Delay' rack-local just a little bit... > long missedOpportunities = > application.getSchedulingOpportunities(schedulerKey); > return getActualNodeLocalityDelay() < missedOpportunities; > } > {code} > Workaround: set {{yarn.scheduler.capacity.node-locality-delay}} to zero to > enforce this rule to be processed immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org