[
https://issues.apache.org/jira/browse/YARN-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223996#comment-16223996
]
Steven Rand commented on YARN-7391:
-----------------------------------
[~templedf] and [~yufeigu], thanks for commenting. Apologies for not including
my use case in the original description. We run multiple long-running Spark
applications, each of which uses Spark's dynamic allocation feature, and
therefore has a demand which fluctuates over time. At any point, the demand of
any given app can be quite low (e.g., only an AM container), or quite high
(e.g., hundreds of executors).
Historically, we've run each app in its own leaf queue, since the Fair
Scheduler has not always supported preemption inside a leaf queue. We've found
that since the fair share of a parent queue is split evenly among all of its
active leaf queues, the fair share of each app is the same, regardless of its
demand. This causes our apps with higher demand to have fair shares that are
too low for them to preempt enough resources to even get close to meeting their
demand. If fair share were based on demand, then our apps with lower demand
would be unaffected, but our apps with higher demand could have high enough
weights to preempt a reasonable number of resources away from apps that over
their fair shares.
This problem led us to consider running more apps inside the same leaf queue,
which is no longer an issue now that the Fair Scheduler supports preemption
inside a leaf queue. We'd hoped to use the size-based weight feature to achieve
the goal of the more demanding apps having high enough fair shares to preempt
sufficient resources away from other apps. However, in experimenting with this
feature, the results were somewhat underwhelming. Yes, the more demanding apps
now have higher fair shares, but not by enough to significantly impact
allocation.
Consider, for example, the rather extreme case of 10 apps running in a leaf
queue, where 9 of them are requesting 20GB each, and 1 of them is requesting
1024GB. The weight of each of the 9 less demanding apps is about 14.3, and the
weight of the highly demanding app is about 20.0. So the highly demanding app
winds up with about 13.5% (20/148) of the queue's fair share, despite having a
demand that's more than 5x that of the other 9 put together, as opposed to the
10% it would have with size-based weight turned off. I know the example is a
bit silly, but I wanted to show that even with huge differences in demand, the
current behavior of size-based weight doesn't produce major differences in
weights.
Does that make sense? Happy to provide more info if helpful.
> Consider square root instead of natural log for size-based weight
> -----------------------------------------------------------------
>
> Key: YARN-7391
> URL: https://issues.apache.org/jira/browse/YARN-7391
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: fairscheduler
> Affects Versions: 3.0.0-beta1
> Reporter: Steven Rand
>
> Currently for size-based weight, we compute the weight of an app using this
> code from
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L377:
> {code}
> if (sizeBasedWeight) {
> // Set weight based on current memory demand
> weight = Math.log1p(app.getDemand().getMemorySize()) / Math.log(2);
> }
> {code}
> Because the natural log function grows slowly, the weights of two apps with
> hugely different memory demands can be quite similar. For example, {{weight}}
> evaluates to 14.3 for an app with a demand of 20 GB, and evaluates to 19.9
> for an app with a demand of 1000 GB. The app with the much larger demand will
> still have a higher weight, but not by a large amount relative to the sum of
> those weights.
> I think it's worth considering a switch to a square root function, which will
> grow more quickly. In the above example, the app with a demand of 20 GB now
> has a weight of 143, while the app with a demand of 1000 GB now has a weight
> of 1012. These weights seem more reasonable relative to each other given the
> difference in demand between the two apps.
> The above example is admittedly a bit extreme, but I believe that a square
> root function would also produce reasonable results in general.
> The code I have in mind would look something like:
> {code}
> if (sizeBasedWeight) {
> // Set weight based on current memory demand
> weight = Math.sqrt(app.getDemand().getMemorySize());
> }
> {code}
> Would people be comfortable with this change?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]