[
https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290073#comment-16290073
]
Steven Rand commented on YARN-7655:
-----------------------------------
One issue I'm having with the test in the patch is that preemption works as
expected, but the starved app doesn't have any containers allocated to it. I
think the series of events that causes this is:
* For purposes of the test, I'm only interested in requesting resources on a
particular node. But as discussed in YARN-7561, this requires me to also make a
rack-local request and a request for any node at the same priority.
* To make sure that the RR that we consider for preemption is the node-local
one, I made the other two RRs too big to be satisfied, so that way
{{getStarvedResourceRequests}} skips them.
* However, when we go to allocate the preempted resources to the starving app,
it turns out that {{FSAppAttempt#hasContainerForNode}} only looks at the
capacity of the off-switch ask:
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L1071.
This causes it to decide that the starving app can't be allocated resources on
the node, since I intentionally made the off-switch RR too big to fit on any of
the test nodes. The fact that the node-local request (for the other node) is
small enough to fit on this node gets ignored.
I'm having trouble figuring out what to do about this. I had assumed that if
relaxLocality was true for an RR, then it would be able to be satisfied on node
B even though it asked for node A. Is this not correct? Or should
FSAppAttempt#hasContainerForNode be modified to check the sizes of the asks at
rack and node-level (if those exist)?
> avoid AM preemption caused by RRs for specific nodes or racks
> -------------------------------------------------------------
>
> Key: YARN-7655
> URL: https://issues.apache.org/jira/browse/YARN-7655
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: fairscheduler
> Affects Versions: 3.0.0
> Reporter: Steven Rand
> Assignee: Steven Rand
> Attachments: YARN-7655-001.patch
>
>
> We frequently see AM preemptions when
> {{starvedApp.getStarvedResourceRequests()}} in
> {{FSPreemptionThread#identifyContainersToPreempt}} includes one or more RRs
> that request containers on a specific node. Since this causes us to only
> consider one node to preempt containers on, the really good work that was
> done in YARN-5830 doesn't save us from AM preemption. Even though there might
> be multiple nodes on which we could preempt enough non-AM containers to
> satisfy the app's starvation, we often wind up preempting one or more AM
> containers on the single node that we're considering.
> A proposed solution is that if we're going to preempt one or more AM
> containers for an RR that specifies a node or rack, then we should instead
> expand the search space to consider all nodes. That way we take advantage of
> YARN-5830, and only preempt AMs if there's no alternative. I've attached a
> patch with an initial implementation of this. We've been running it on a few
> clusters, and have seen AM preemptions drop from double-digit occurrences on
> many days to zero.
> Of course, the tradeoff is some loss of locality, since the starved app is
> less likely to be allocated resources at the most specific locality level
> that it asked for. My opinion is that this tradeoff is worth it, but
> interested to hear what others think as well.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]