[ https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343803#comment-16343803 ]
Yufei Gu commented on YARN-7655: -------------------------------- Yes, let's move forward with this patch. We are trying to introduce node labeling or any similar idea into scheduler recently. Node labeling or its whatever alternative behaves in a way like data locality. If we want to let AM preemption to get around locality somehow, we need do the same thing to node labeling. I will try to review your patch soon. > avoid AM preemption caused by RRs for specific nodes or racks > ------------------------------------------------------------- > > Key: YARN-7655 > URL: https://issues.apache.org/jira/browse/YARN-7655 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler > Affects Versions: 3.0.0 > Reporter: Steven Rand > Assignee: Steven Rand > Priority: Major > Attachments: YARN-7655-001.patch > > > We frequently see AM preemptions when > {{starvedApp.getStarvedResourceRequests()}} in > {{FSPreemptionThread#identifyContainersToPreempt}} includes one or more RRs > that request containers on a specific node. Since this causes us to only > consider one node to preempt containers on, the really good work that was > done in YARN-5830 doesn't save us from AM preemption. Even though there might > be multiple nodes on which we could preempt enough non-AM containers to > satisfy the app's starvation, we often wind up preempting one or more AM > containers on the single node that we're considering. > A proposed solution is that if we're going to preempt one or more AM > containers for an RR that specifies a node or rack, then we should instead > expand the search space to consider all nodes. That way we take advantage of > YARN-5830, and only preempt AMs if there's no alternative. I've attached a > patch with an initial implementation of this. We've been running it on a few > clusters, and have seen AM preemptions drop from double-digit occurrences on > many days to zero. > Of course, the tradeoff is some loss of locality, since the starved app is > less likely to be allocated resources at the most specific locality level > that it asked for. My opinion is that this tradeoff is worth it, but > interested to hear what others think as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org