[
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15494825#comment-15494825
]
Wangda Tan commented on YARN-4945:
----------------------------------
1) YarnConfiguration:
- Instead of have a separate SELECT_CANDIDATES_FOR_INTRAQUEUE_PREEMPTION,
should we only have a "queue.intra-queue-preemption-enabled"? I cannot clearly
think what it means in semantic, one example is, after we have user-limit
preemption support, what happens if we only enable the user-limit preemption
(without priority preemption enabled)?
2) PCPP:
- Unused imports / methods
- getPartitionResource: avoid clone resources? Because we will clone resource
twice for every app. If you concern about consistency, you can clone it once
before starting preemption calculation
- It seems to me, partitionToUnderServedQueues can be kept in
AbstractPreemptableResourceCalculator.
In addition, Map<String, LinkedHashSet<String>> could be Map<String,
List<String>>. (LinkedHashSet is not necessarily needed, because we won't have
two TempQueuePerPartition with the same queueName and same partition)
3) CapacitySchedulerPreemptionUtils:
- deductPreemptableResourcePerApp, is following a valid comment?
bq. // High priority app is coming first
- Remove unnecessary param in method and new generic type (like new
HashMap(...)), better to move to Intellij? :p
- {getResToObtainByPartitionForApps}} can be removed, we can directly use
policy.getResourceDemandFromAppsPerQueue
4) FiCaSchedulerApp:
Mvoe getTotalPendingRequestsPerPartition to ResourceUsage? I can see we could
have requirements to: getUsedResourceByPartition,
getReservedReosurceByPartition, etc. in the future
5) PreemptionCandidatesSelector:
- All non-abstract methods can be static, correct?
- All TODOs in comments are done, correct?
6) IntraQueuePreemptionPolicy and PriorityIntraQueuePreemptionPolicy:
- Overall: Do you think if the name: -Policy is too big? What it essentially do
is computing how much resource to preempt from each app, how about call it
something like IntraQueuePreemptionComputePlugin? Would like to hear thoughts
from you and Eric for this as well.
- Rename the PriorityIntraQueuePreemptionPolicy to
FifoIntraQueuePreemptionPolicy if you agree with [my
comment|https://issues.apache.org/jira/browse/YARN-4945?focusedCommentId=15494454&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15494454]
- PriorityIntraQueuePreemptionPolicy#getResourceDemandFromAppsPerQueue:
a. resToObtainByPartition can be removed from parameter
b. IIUC, it gets resourceToObtain for each app instead of gets resourceDemand
for each app, rename it properly?
c. This logic is not correct:
{code}
// If demand is from same priority level, skip the same.
if (!tq.isPendingDemandForHigherPriorityApps(a1.getPriority())) {
continue;
}
{code}
It can only avoid highest priority in a queue applications preempt from each
other, but it cannot avoid 2nd highest applications from each other. And the
performance can be improved as well, I believe in some settings, maxAppPriority
can be as much as MAX_INT. Please look for below comments/pesudo code for
details.
- computeAppsIdealAllocation:
a. Calling getUserLimitHeadRoomPerApp is too expensive, instead we can add one
method in LeafQueue to get UserLimit by userName. Have a Map of username to
headroom inside the method can compute user limit at most once for different
user. And this logic can be reused to compute user-limit preemption
b. {{tq.addPendinResourcePerPriority(tmpApp.getPriority(), tmpApp.pending);}}
could be changed if you agree with above .c
c. I think we should move the {{skip the same priority demand}} logic into this
method. One approach in my mind is:
{code}
// General idea:
// Use two pointer, one from most prioritized app, one from least prioritized
app
// Each app has two quotas, one is how much resource required (ideal - used),
// Another one is how much resource can be preempted
// Move the two pointer and update the two quotas to get:
// For application X, is there any app with higher priority need the resource?
p1 = most-prioritized-app.iterator
p2 = least-prioritized-app.iterator
// For each app, we have:
// - "toPreemptFromOther" which initialized to (ideal - (used - selected)).
// - "actuallyToBePreempted" initialized to 0
while (p1.getPriority() > p2.getPriority() && p1 != p2) {
Resource rest = p2.toBePreempt - p2.actuallyToBePreempted;
if (rest > 0) {
if (p1.toBePreemptFromOther > 0) {
Resource toPreempt = min(p1.toBePreemptFromOther, rest);
p1.toBePreemptFromOther -= toPreempt
p2.actuallyToBePreempted += toPreempt
}
}
if (p2.toBePreempt - p2.actuallyToBePreempted == 0) {
// Nothing more can be preempt from p2, move to next
p2 --;
}
if (p1.toBePreemptFromOther == 0) {
// p1 is satisified, move to next
p1 ++;
}
}
{code}
d. After change c. getResourceDemandFromAppsPerQueue will simply return
actuallyToPreempted for apps in a queue groupped by partition
7) TempAppPerQueue:
- It should be TempAppPerPartition
- Is it possible to add a common class for app/queue to avoid dup logic?
- Is it better to rename Temp- to something like AppPartitionSnapshot or
QueuePartitionSnapshot? This can be done in a separate patch for better review
I haven't look at very detailed code logics of IntraQueueCandidatesSelector and
IntraQueueCalculator / Policy, etc. Will do that in the next iterator.
> [Umbrella] Capacity Scheduler Preemption Within a queue
> -------------------------------------------------------
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf,
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch,
> YARN-2009-wip.patch, YARN-2009-wip.v3.patch, YARN-2009.v0.patch,
> YARN-2009.v1.patch, YARN-2009.v2.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to
> support features like:
> YARN-2009. YARN-2113. YARN-4781.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]