[
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428904#comment-15428904
]
Eric Payne commented on YARN-4945:
----------------------------------
[~sunilg], thank you so much for providing this design doc and POC. I have not
yet looked at the patch, but I have a few comments on the design doc.
-----
{quote}
Additional Requirement specs
...
- Over subscribed queue ...
-- Selected containers will completely serve resource need from starving apps.
...
-- Selected containers only partially serves the need
...
By scanning through each partition and its associated queues
(TempQueuePerPartition), we can understand how much resources are offered from
each queue for preemption and also the selected container list. This can be
used as a reference to avoid double calculations in intraqueue preemption
round.
{quote}
I'm pretty sure that the containers already in the {{selectedCandidates}} list
will _not_ be re-assigned to anything in the current queue. The containers are
in that list because some other queue is asking for them. Even if containers
that are already in the inter-queue preemption list would also help resolve an
intra-queue preemption problem, those containers will go to the more
underserved queue before coming back to the current queue. My assertion is that
regardless of what containers are already in the {{selectedCandidates}} list,
the intra-queue preemption policy would always need to select more.
-----
{quote}
Configurations and considerations
- Provide a configuration to turn on/off intraqueue preemption along with the
type of policy it is going to handle (priority, fairness, userlimit etc)
{quote}
Additionally, we may want to consider intra-queue preemption configs for dead
zone, natural completion, etc. This may even need to be per queue.
-----
{quote}
Select ideal candidates for intraqueue preemption per priority.
...
3. ‘pending’ resource per partition will be calculated for all the apps and
together store in a consolidated map (resourceToObtain) of pending resource to
be collected per partition in one queue.
{quote}
The use of the word "pending" in conjunction with the reference to
{{resourceToObtain}} is confusing to me. It sounds like "pending" is talking
about "preemptable resources," but "pending" means "resources requested but not
yet allocated." (See
{{LeafQueue#getTotalPendingResourcesConsideringUserLimit}}).
For instance, the {{resToObtainByPartition}} variable in
{{FifoCandidatesSelector}} is used for holding the amount of extra (and
therefore preemptable) resources being used by a queue. Is this step
calculating the total of preemptable resources for apps in this queue, per
partition?
-----
{quote}
4. While doing this, we will ensure that certains apps will be skipped if it is
already equal or more that its userlimit quota. This map will be the entry
point to select candidates from lower priority apps in next step.
{quote}
Is this saying that, when marking containers for preemption, if an app is under
its user limit percent, its containers will not be marked? Or, is it saying
that if an app is asking for more containers and it is already over its user
limit percent, other apps' containers won't be preempted on its behalf?
Not only do we need to avoid preemptiong resources _for_ users that are over
their user limit percent, we need to avoid preempting containers _from_ users
that are under their user limit percent. Even today in the capacity scheuler,
if I have a queue with a 50% user limit percent, and app1 from user1 is
priority1 and app2 from user2 is priority2, and they are both asking for more
resources, user2 will not get more containers until user1 has reached 50% of
the queue. In other words, user limit percent trumps application priority.
-----
I am concerned that priority-based intra-queue preemption has a different set
of goals than user limit percent-based intra-queue preemption. For instance,
- requirements for user limit percent-based preemption are calculated based at
the user level, while priority-based preemption requirements go down to the app
level.
- User limit percent-based preemption only makes sense if multiple users are in
a queue, and priority-based preemption only makes sense if a priority inversion
can happen between apps of the same user in a queue.
Perhaps these should be totally separate policies. Anyway, for us, user limit
percent-based preemption is much more important.
> [Umbrella] Capacity Scheduler Preemption Within a queue
> -------------------------------------------------------
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Wangda Tan
> Attachments: IntraQueuepreemption-CapacityScheduler (Design).pdf,
> YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to
> support features like:
> YARN-2009. YARN-2113. YARN-4781.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]