[ 
https://issues.apache.org/jira/browse/YARN-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated YARN-6960:
------------------------------
    Attachment: YARN-6960.001.patch

> definition of active queue allows idle long-running apps to distort fair 
> shares
> -------------------------------------------------------------------------------
>
>                 Key: YARN-6960
>                 URL: https://issues.apache.org/jira/browse/YARN-6960
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.8.1, 3.0.0-alpha4
>            Reporter: Steven Rand
>            Assignee: Steven Rand
>         Attachments: YARN-6960.001.patch
>
>
> YARN-2026 introduced the notion of only considering active queues when 
> computing the fair share of each queue. The definition of an active queue is 
> a queue with at least one runnable app:
> {code}
>   public boolean isActive() {
>     return getNumRunnableApps() > 0;
>   }
> {code}
> One case that this definition of activity doesn't account for is that of 
> long-running applications that scale dynamically. Such an application might 
> request many containers when jobs are running, but scale down to very few 
> containers, or only the AM container, when no jobs are running.
> Even when such an application has scaled down to a negligible amount of 
> demand and utilization, the queue that it's in is still considered to be 
> active, which defeats the purpose of YARN-2026. For example, consider this 
> scenario:
> 1. We have queues {{root.a}}, {{root.b}}, {{root.c}}, and {{root.d}}, all of 
> which have the same weight.
> 2. Queues {{root.a}} and {{root.b}} contain long-running applications that 
> currently have only one container each (the AM).
> 3. An application in queue {{root.c}} starts, and uses the whole cluster 
> except for the small amount in use by {{root.a}} and {{root.b}}. An 
> application in {{root.d}} starts, and has a high enough demand to be able to 
> use half of the cluster. Because all four queues are active, the app in 
> {{root.d}} can only preempt the app in {{root.c}} up to roughly 25% of the 
> cluster's resources, while the app in {{root.c}} keeps about 75%.
> Ideally in this example, the app in {{root.d}} would be able to preempt the 
> app in {{root.c}} up to 50% of the cluster, which would be possible if the 
> idle apps in {{root.a}} and {{root.b}} didn't cause those queues to be 
> considered active.
> One way to address this is to update the definition of an active queue to be 
> a queue containing 1 or more non-AM containers. This way if all apps in a 
> queue scale down to only the AM, other queues' fair shares aren't affected.
> The benefit of this approach is that it's quite simple. The downside is that 
> it doesn't account for apps that are idle and using almost no resources, but 
> still have at least one non-AM container.
> There are a couple of other options that seem plausible to me, but they're 
> much more complicated, and it seems to me that this proposal makes good 
> progress while adding minimal extra complexity.
> Does this seem like a reasonable change? I'm certainly open to better ideas 
> as well.
> Thanks,
> Steve



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to