[
https://issues.apache.org/jira/browse/YARN-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bilwa S T reassigned YARN-10243:
--------------------------------
Assignee: Bilwa S T
> Rack-only localization constraint for MR AM is broken for CapacityScheduler
> ---------------------------------------------------------------------------
>
> Key: YARN-10243
> URL: https://issues.apache.org/jira/browse/YARN-10243
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, capacityscheduler
> Affects Versions: 3.2.0
> Reporter: Adam Antal
> Assignee: Bilwa S T
> Priority: Major
>
> Reproduction: Start a MR sleep job with strict-locality configured for AM
> ({{-Dmapreduce.job.am.strict-locality=/rack1}} for instance). If
> CapacityScheduler is used, the job will hang (stuck in SCHEDULED state).
> Root cause: if there are no other resources requested (like node locality or
> other constraint), the scheduling opportunities counter will not be
> incremented and the following piece of code always returns false (so we
> always skip this constraint) resulting in an infinite loop:
> {code:java}
> // If we are here, we do need containers on this rack for RACK_LOCAL req
> if (type == NodeType.RACK_LOCAL) {
> // 'Delay' rack-local just a little bit...
> long missedOpportunities =
> application.getSchedulingOpportunities(schedulerKey);
> return getActualNodeLocalityDelay() < missedOpportunities;
> }
> {code}
> Workaround: set {{yarn.scheduler.capacity.node-locality-delay}} to zero to
> enforce this rule to be processed immediately.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]