[
https://issues.apache.org/jira/browse/YARN-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137152#comment-16137152
]
Daniel Templeton commented on YARN-6960:
----------------------------------------
I'll take a look when I get a chance.
> definition of active queue allows idle long-running apps to distort fair
> shares
> -------------------------------------------------------------------------------
>
> Key: YARN-6960
> URL: https://issues.apache.org/jira/browse/YARN-6960
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 2.8.1, 3.0.0-alpha4
> Reporter: Steven Rand
> Assignee: Steven Rand
> Attachments: YARN-6960.001.patch, YARN-6960.002.patch
>
>
> YARN-2026 introduced the notion of only considering active queues when
> computing the fair share of each queue. The definition of an active queue is
> a queue with at least one runnable app:
> {code}
> public boolean isActive() {
> return getNumRunnableApps() > 0;
> }
> {code}
> One case that this definition of activity doesn't account for is that of
> long-running applications that scale dynamically. Such an application might
> request many containers when jobs are running, but scale down to very few
> containers, or only the AM container, when no jobs are running.
> Even when such an application has scaled down to a negligible amount of
> demand and utilization, the queue that it's in is still considered to be
> active, which defeats the purpose of YARN-2026. For example, consider this
> scenario:
> 1. We have queues {{root.a}}, {{root.b}}, {{root.c}}, and {{root.d}}, all of
> which have the same weight.
> 2. Queues {{root.a}} and {{root.b}} contain long-running applications that
> currently have only one container each (the AM).
> 3. An application in queue {{root.c}} starts, and uses the whole cluster
> except for the small amount in use by {{root.a}} and {{root.b}}. An
> application in {{root.d}} starts, and has a high enough demand to be able to
> use half of the cluster. Because all four queues are active, the app in
> {{root.d}} can only preempt the app in {{root.c}} up to roughly 25% of the
> cluster's resources, while the app in {{root.c}} keeps about 75%.
> Ideally in this example, the app in {{root.d}} would be able to preempt the
> app in {{root.c}} up to 50% of the cluster, which would be possible if the
> idle apps in {{root.a}} and {{root.b}} didn't cause those queues to be
> considered active.
> One way to address this is to update the definition of an active queue to be
> a queue containing 1 or more non-AM containers. This way if all apps in a
> queue scale down to only the AM, other queues' fair shares aren't affected.
> The benefit of this approach is that it's quite simple. The downside is that
> it doesn't account for apps that are idle and using almost no resources, but
> still have at least one non-AM container.
> There are a couple of other options that seem plausible to me, but they're
> much more complicated, and it seems to me that this proposal makes good
> progress while adding minimal extra complexity.
> Does this seem like a reasonable change? I'm certainly open to better ideas
> as well.
> Thanks,
> Steve
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]