[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290073#comment-16290073 ]
Steven Rand commented on YARN-7655: ----------------------------------- One issue I'm having with the test in the patch is that preemption works as expected, but the starved app doesn't have any containers allocated to it. I think the series of events that causes this is: * For purposes of the test, I'm only interested in requesting resources on a particular node. But as discussed in YARN-7561, this requires me to also make a rack-local request and a request for any node at the same priority. * To make sure that the RR that we consider for preemption is the node-local one, I made the other two RRs too big to be satisfied, so that way {{getStarvedResourceRequests}} skips them. * However, when we go to allocate the preempted resources to the starving app, it turns out that {{FSAppAttempt#hasContainerForNode}} only looks at the capacity of the off-switch ask: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L1071. This causes it to decide that the starving app can't be allocated resources on the node, since I intentionally made the off-switch RR too big to fit on any of the test nodes. The fact that the node-local request (for the other node) is small enough to fit on this node gets ignored. I'm having trouble figuring out what to do about this. I had assumed that if relaxLocality was true for an RR, then it would be able to be satisfied on node B even though it asked for node A. Is this not correct? Or should FSAppAttempt#hasContainerForNode be modified to check the sizes of the asks at rack and node-level (if those exist)? > avoid AM preemption caused by RRs for specific nodes or racks > ------------------------------------------------------------- > > Key: YARN-7655 > URL: https://issues.apache.org/jira/browse/YARN-7655 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler > Affects Versions: 3.0.0 > Reporter: Steven Rand > Assignee: Steven Rand > Attachments: YARN-7655-001.patch > > > We frequently see AM preemptions when > {{starvedApp.getStarvedResourceRequests()}} in > {{FSPreemptionThread#identifyContainersToPreempt}} includes one or more RRs > that request containers on a specific node. Since this causes us to only > consider one node to preempt containers on, the really good work that was > done in YARN-5830 doesn't save us from AM preemption. Even though there might > be multiple nodes on which we could preempt enough non-AM containers to > satisfy the app's starvation, we often wind up preempting one or more AM > containers on the single node that we're considering. > A proposed solution is that if we're going to preempt one or more AM > containers for an RR that specifies a node or rack, then we should instead > expand the search space to consider all nodes. That way we take advantage of > YARN-5830, and only preempt AMs if there's no alternative. I've attached a > patch with an initial implementation of this. We've been running it on a few > clusters, and have seen AM preemptions drop from double-digit occurrences on > many days to zero. > Of course, the tradeoff is some loss of locality, since the starved app is > less likely to be allocated resources at the most specific locality level > that it asked for. My opinion is that this tradeoff is worth it, but > interested to hear what others think as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org