[
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wangda Tan updated YARN-5864:
-----------------------------
Attachment: YARN-5864.004.patch
[~sunilg], thanks for reviewing, for your comments:
For 1), yes, underutilized queue always goes first before overutilized queues.
For 2), I have thought about this. I intentionally make it to two policies
because:
- All configurations will be grouped, for example preemption-related
configuration.
- Priority can be interpreted in different way, for example, priority could be
used as "weights" in different policy implementation.
- Avoid too many options to enable/disable features inside one option.
- Internal implementation is not related how admin uses the feature.
For 3), added comment to make sure ParentQueue uses readlock correctly. (Now it
is fine).
For 4), it should be fine, it is already part of Maven dependency.
For 5), As noted in comment, I agree that we can optimize this. Since time
complexity of this algorithm is O(N^2 * Max_queue_depth), N is #LeafQueue.
Since we have limited number of leaf queues, and Max_queue_depth is a small
constant. We're fine now.
For 6), Similar to above, we're fine now, and 5)/6) can be done separately.
For 7), Updated
For 8), Updated, and added new test.
For 9), Updated according to changes of 8)
For 10), I think we should make sure queue properties like
used/pending/reserved will not be updated. And ideal-assigned/preemptable could
be changed for different selectors. Please comment if you find any changes from
IntraQueueSelector.
For 11), Updated
For 12), Considered this, I cannot think of a relatively easy approach to do
this.
The time complexity will be O(#containers * #reserved-nodes). And since we have
a "touchedNode" set to avoid double check nodes, it should not a big problem
even we have a large cluster. I will do some SLS performance test to make sure
it works well.
Attached ver.4 patch. This patch is on top of YARN-6081, will update patch
available state once YARN-6081 get committed.
> YARN Capacity Scheduler - Queue Priorities
> ------------------------------------------
>
> Key: YARN-5864
> URL: https://issues.apache.org/jira/browse/YARN-5864
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-5864.001.patch, YARN-5864.002.patch,
> YARN-5864.003.patch, YARN-5864.004.patch, YARN-5864.poc-0.patch,
> YARN-CapacityScheduler-Queue-Priorities-design-v1.pdf
>
>
> Currently, Capacity Scheduler at every parent-queue level uses relative
> used-capacities of the chil-queues to decide which queue can get next
> available resource first.
> For example,
> - Q1 & Q2 are child queues under queueA
> - Q1 has 20% of configured capacity, 5% of used-capacity and
> - Q2 has 80% of configured capacity, 8% of used-capacity.
> In the situation, the relative used-capacities are calculated as below
> - Relative used-capacity of Q1 is 5/20 = 0.25
> - Relative used-capacity of Q2 is 8/80 = 0.10
> In the above example, per today’s Capacity Scheduler’s algorithm, Q2 is
> selected by the scheduler first to receive next available resource.
> Simply ordering queues according to relative used-capacities sometimes causes
> a few troubles because scarce resources could be assigned to less-important
> apps first.
> # Latency sensitivity: This can be a problem with latency sensitive
> applications where waiting till the ‘other’ queue gets full is not going to
> cut it. The delay in scheduling directly reflects in the response times of
> these applications.
> # Resource fragmentation for large-container apps: Today’s algorithm also
> causes issues with applications that need very large containers. It is
> possible that existing queues are all within their resource guarantees but
> their current allocation distribution on each node may be such that an
> application which needs large container simply cannot fit on those nodes.
> Services:
> # The above problem (2) gets worse with long running applications. With short
> running apps, previous containers may eventually finish and make enough space
> for the apps with large containers. But with long running services in the
> cluster, the large containers’ application may never get resources on any
> nodes even if its demands are not yet met.
> # Long running services are sometimes more picky w.r.t placement than normal
> batch apps. For example, for a long running service in a separate queue (say
> queue=service), during peak hours it may want to launch instances on 50% of
> the cluster nodes. On each node, it may want to launch a large container, say
> 200G memory per container.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]