[ 
https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325928#comment-16325928
 ] 

Steven Rand edited comment on YARN-7655 at 1/15/18 6:49 AM:
------------------------------------------------------------

Thanks [~yufeigu] for taking a look. The cluster sizes and nodes should be 
pretty reasonable – for the three clusters I have in mind, the nodes are AWS 
ec2 instances with around 120 GB of RAM and around 20 vcores. The clusters 
range in size from double-digits to low triple-digits.

That said, there is some configuration in place at these clusters which could 
explain high rates of AM preemption. Specifically:
 * The default max AM share is set to -1. Unfortunately the max AM share 
feature, while totally reasonable as far as I can tell, was causing a good deal 
of confusion when apps would fail to start for no apparent reason upon hitting 
the limit, and we disabled it in the hope that having one less variable would 
make the scheduler's behavior easier to understand.
 * The default fair share preemption threshold is set to 1.0. This was also an 
attempt to reduce confusion, as failure to preempt while below fair share (but 
above fair share * the threshold) was commonly misinterpreted as a bug.
 * The preemption timeouts for fair share and min share are also non-default – 
they're set to one second each.

Possibly the configuration overrides, along with access patterns that include 
apps frequently starting up or increasing their demand via Spark's dynamic 
allocation feature, are the issue here, in which case we don't need to pursue 
this JIRA further. Data on whether or not other YARN deployments experience 
this issue would be useful, though not easy to come by, as I had to add custom 
logging to identify NODE_LOCAL requests as the cause of most AM preemptions at 
these clusters.


was (Author: steven rand):
Thanks [~yufeigu] for taking a look. The cluster sizes and nodes should be 
pretty reasonable -- for the three clusters I have in mind, the nodes are AWS 
ec2 instances with around 120 GB of RAM and around 20 vcores. The clusters 
range in size from double-digits to low triple-digits.

That said, there is some configuration in place at these clusters which could 
explain high rates of AM preemption. Specifically:

* The default max AM share is set to -1. Unfortunately the max AM share 
feature, while totally reasonable as far as I can tell, was causing a good deal 
of confusion when apps would fail to start for no apparently reason upon 
hitting the limit, and we disabled it in the hope that having one less variable 
would make the scheduler's behavior easier to understand.
* The default fair share preemption threshold is set to 1.0. This was also an 
attempt to reduce confusion, as failure to preempt while below fair share (but 
above fair share * the threshold) was commonly misinterpreted as a bug.
* The preemption timeouts for fair share and min share are also non-default -- 
they're set to one second each.

Possibly the configuration overrides, along with access patterns that include 
apps frequently starting up or increasing their demand via Spark's dynamic 
allocation feature, are the issue here, in which case we don't need to pursue 
this JIRA further. Data on whether or not other YARN deployments experience 
this issue would be useful, though not easy to come by, as I had to add custom 
logging to identify NODE_LOCAL requests as the cause of most AM preemptions at 
these clusters.

> avoid AM preemption caused by RRs for specific nodes or racks
> -------------------------------------------------------------
>
>                 Key: YARN-7655
>                 URL: https://issues.apache.org/jira/browse/YARN-7655
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>    Affects Versions: 3.0.0
>            Reporter: Steven Rand
>            Assignee: Steven Rand
>            Priority: Major
>         Attachments: YARN-7655-001.patch
>
>
> We frequently see AM preemptions when 
> {{starvedApp.getStarvedResourceRequests()}} in 
> {{FSPreemptionThread#identifyContainersToPreempt}} includes one or more RRs 
> that request containers on a specific node. Since this causes us to only 
> consider one node to preempt containers on, the really good work that was 
> done in YARN-5830 doesn't save us from AM preemption. Even though there might 
> be multiple nodes on which we could preempt enough non-AM containers to 
> satisfy the app's starvation, we often wind up preempting one or more AM 
> containers on the single node that we're considering.
> A proposed solution is that if we're going to preempt one or more AM 
> containers for an RR that specifies a node or rack, then we should instead 
> expand the search space to consider all nodes. That way we take advantage of 
> YARN-5830, and only preempt AMs if there's no alternative. I've attached a 
> patch with an initial implementation of this. We've been running it on a few 
> clusters, and have seen AM preemptions drop from double-digit occurrences on 
> many days to zero.
> Of course, the tradeoff is some loss of locality, since the starved app is 
> less likely to be allocated resources at the most specific locality level 
> that it asked for. My opinion is that this tradeoff is worth it, but 
> interested to hear what others think as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to