[
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256541#comment-15256541
]
Sunil G commented on YARN-4390:
-------------------------------
HI [~leftnoteasy]
Thanks for sharing patch here, Few comments/doubts from my side.
1. In
{{CapacitySchedulerPreemptionUtils#deductPreemptableResourcesBasedSelectedCandidates}},
partition is retrieved from {{SchedulerNode}}. I think we can get this from
{{RMContainer#getNodeLabelExpression}} itself considering we will update
RMContainer labelExpression (via a set call) while we invoke
{{replaceLabelsOnNode}}.
2. {{context.getQueueByPartition}} has a code path flow to return NULL
eventhough its not possible to happen. With this assumption, null check is not
handled in caller side. So is it better to raise an exception and skip that
round of preemption to add more clarify?
3. From {{FifoCandidatesSelector}},
{code}
CapacitySchedulerPreemptionUtils
72 .deductPreemptableResourcesBasedSelectedCandidates(
{code}
Ideally this code snippet helps to deduct already selected containers resource
from its corresponding {{TempQueuePartition}}. On that note, I think this code
need not have to be inside one such selector. Rather it may be better suited
from the caller end and ensure that after each round of selection, possible
deduction is done.
4. One doubt about the newly changed {{PreemptableResourceCalculator}} ctor.
Could we get the information about
{{considersReservedResourceWhenCalculateIdeal}} from {{preemptionContext}}
itself. We could add this as a getter api so that code may be more clean.
5. {{getNodesForPreemption}} in {{ReservedContainerCandidatesSelector}} has to
scan all nodes in cluster to get whether any node has a reserved container or
not. For recent UI work, I thought having a metric to know the count of
reserved nodes in cluster. Is it costlier to keep a reserved nodes list in
scheduler? If so, could that be used here. Its just a thought, I havent really
checked the cost of keeping such list. So its fine to scrap this idea also :)
6. In {{ReservedContainerCandidatesSelector}}, containers are sorted against
launch time. Is it possible that we may select a high priority latest container
against some low priority slightly old one.
7. As I see {{RMContainer}} is read-only interface. I feel {{setQueueName}}
can be moved out from this interface and place in {{RMContainerImpl}}. I think
we could downcast and use the same.
Minor nits:
1. {{ReservedContainerCandidatesSelector#getPreemptionCandidatesOnNode}} has
some commented code. It can be removed.
2. in {{TempQueuePerPartition}}, could we add {{reserved}} to {{toString}} for
logging.
I think test class looks very clean and readable after new framework change. It
is very good and really useful :).
I thought of trying this framework sometime tomorrow,and will share if I have
comments.
> Do surgical preemption based on reserved container in CapacityScheduler
> -----------------------------------------------------------------------
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacity scheduler
> Affects Versions: 3.0.0, 2.8.0, 2.7.3
> Reporter: Eric Payne
> Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf,
> YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch,
> YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt
> containers. One is that an app could be requesting a large container (say
> 8-GB), and the preemption monitor could conceivably preempt multiple
> containers (say 8, 1-GB containers) in order to fill the large container
> request. These smaller containers would then be rejected by the requesting AM
> and potentially given right back to the preempted app.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)