[
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799628#comment-15799628
]
Wangda Tan commented on YARN-5864:
----------------------------------
Thanks [~eepayne] to review the design doc.
bq. My understanding is that this containers on under-utilized queues won't be
preempted unless a higher priority queue is asking.
It is true, but not all phase I/II means.
Here's an example of Phase I/II:
Queue A/B/C/D/E has priority A > B = C > D > E
Assume A is under utilized and has pending ask, B/C are over-utilized, and D/E
are under utilized without pending ask .
To satisfy request of A:
- For phase I, we will first try to preempt from B/C since they're
over-utilized (even if they have higher priority comparing to D/C), if allocate
resource of B/C are not enough, or locality doesn't much ..
- Phase II, we will continue preempt resource from D/E.
Hope this answers your question.
> YARN Capacity Scheduler - Queue Priorities
> ------------------------------------------
>
> Key: YARN-5864
> URL: https://issues.apache.org/jira/browse/YARN-5864
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-5864.poc-0.patch,
> YARN-CapacityScheduler-Queue-Priorities-design-v1.pdf
>
>
> Currently, Capacity Scheduler at every parent-queue level uses relative
> used-capacities of the chil-queues to decide which queue can get next
> available resource first.
> For example,
> - Q1 & Q2 are child queues under queueA
> - Q1 has 20% of configured capacity, 5% of used-capacity and
> - Q2 has 80% of configured capacity, 8% of used-capacity.
> In the situation, the relative used-capacities are calculated as below
> - Relative used-capacity of Q1 is 5/20 = 0.25
> - Relative used-capacity of Q2 is 8/80 = 0.10
> In the above example, per today’s Capacity Scheduler’s algorithm, Q2 is
> selected by the scheduler first to receive next available resource.
> Simply ordering queues according to relative used-capacities sometimes causes
> a few troubles because scarce resources could be assigned to less-important
> apps first.
> # Latency sensitivity: This can be a problem with latency sensitive
> applications where waiting till the ‘other’ queue gets full is not going to
> cut it. The delay in scheduling directly reflects in the response times of
> these applications.
> # Resource fragmentation for large-container apps: Today’s algorithm also
> causes issues with applications that need very large containers. It is
> possible that existing queues are all within their resource guarantees but
> their current allocation distribution on each node may be such that an
> application which needs large container simply cannot fit on those nodes.
> Services:
> # The above problem (2) gets worse with long running applications. With short
> running apps, previous containers may eventually finish and make enough space
> for the apps with large containers. But with long running services in the
> cluster, the large containers’ application may never get resources on any
> nodes even if its demands are not yet met.
> # Long running services are sometimes more picky w.r.t placement than normal
> batch apps. For example, for a long running service in a separate queue (say
> queue=service), during peak hours it may want to launch instances on 50% of
> the cluster nodes. On each node, it may want to launch a large container, say
> 200G memory per container.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]