Steven Rand created YARN-7911:
---------------------------------

             Summary: Method identifyContainersToPreempt uses 
ResourceRequest#getRelaxLocality incorrectly
                 Key: YARN-7911
                 URL: https://issues.apache.org/jira/browse/YARN-7911
             Project: Hadoop YARN
          Issue Type: Bug
          Components: fairscheduler, resourcemanager
    Affects Versions: 3.1.0
            Reporter: Steven Rand
            Assignee: Steven Rand


After YARN-7655, in {{identifyContainersToPreempt}} we expand the search space 
to all nodes if we had previously only considered a subset to satisfy a 
{{NODE_LOCAL}} or {{RACK_LOCAL}} RR, and were going to preempt AM containers as 
a result, and the RR allowed locality to be relaxed:

{code}
        // Don't preempt AM containers just to satisfy local requests if relax
        // locality is enabled.
        if (bestContainers != null
                && bestContainers.numAMContainers > 0
                && !ResourceRequest.isAnyLocation(rr.getResourceName())
                && rr.getRelaxLocality()) {
          bestContainers = identifyContainersToPreemptForOneContainer(
                  scheduler.getNodeTracker().getAllNodes(), rr);
        }
{code}

This turns out to be based on a misunderstanding of what 
{{rr.getRelaxLocality}} means. I had believed that it means that locality can 
be relaxed _from_ that level. However, it actually means that locality can be 
relaxed _to_ that level: 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java#L450.

For example, suppose we have {{relaxLocality}} set to {{true}} at the node 
level, but {{false}} at the rack and {{ANY}} levels. This is saying that we 
cannot relax locality to the rack level. However, the current behavior after 
YARN-7655 is to interpret relaxLocality being true at the node level as saying 
that it's okay to satisfy the request elsewhere.

What we should do instead is check whether relaxLocality is enabled for the 
corresponding RR at the next level. So if we're considering a node-level RR, we 
should find the corresponding rack-level RR and check whether relaxLocality is 
enabled for it. And similarly, if we're considering a rack-level RR, we should 
check the corresponding any-level RR.

It may also be better to use {{FSAppAttempt#getAllowedLocalityLevel}} instead 
of explicitly checking {{relaxLocality}}, but I'm not sure which is correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to