[
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102135#comment-15102135
]
Sunil G commented on YARN-4108:
-------------------------------
Thanks [~leftnoteasy] for the clarifications and thank [~eepayne] for the
inputs. I skimmed through the patch and it looks fine.
bq. It would be interesting to know what your thoughts are on making further
modifications to PCPP to make more informed choices about which containers to
kill
As Wangda mentioned, I think we can discuss this point in the new Jira. In top
of my mind, few use case are there.
We are selecting containers based on priority/submission time. But few
containers might be *costly* for AM and AM can try save those containers
maximum posible way (will surely preempt if AM cannot spare any other container
for the demand from PCPP). Also *time remaining / % of completion* are also
some good option. All these cannot be plugged together, so based on use case we
can choose the set of policies needed there. its a very rough/raw idea now, we
need to discuss and refine it more (because possible proto changes may needed
for AM-RM communication).
bq.I understand this could lead to unnecessary add-container-to-preempt-list
event send to AM, but I think it's better than excessive killing containers
I think this relaxation can be taken. Looks fine.
bq.From PreemptionManager,
{code}private Set<ContainerId> killableContainers = new HashSet<>();{code}
I think we can avoid this list. rather we can verify from PreemptionEntity
itself.
bq.A queue who is using more than max-capacity, an it has killable container,
we will try to kill containers for such queues to make sure it doesn't violate
max-capacity
Now along with PCPP, preemption will happen on above case. I think we can add
some more detailed diagnostics here to give reason for preemption that
max-capacity is violated etc.. It will helpful while debugging.
Not in the scope of this jira, if we can add a log summary after each PCPP
round such as:
>Demand is 8024 for queueA
>> Unreserved 2GB resource of application <appId> from node <nodeId>
>> Killed 3 nos of 2GB container of application <appId> from queueB
>> Killed 2 nos of 3GB container of application <appId> from queueB
Now this log will make more sense. After this ticket, if its fine, we can add
this separately.
> CapacityScheduler: Improve preemption to preempt only those containers that
> would satisfy the incoming request
> --------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf,
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf,
> YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113),
> cross applicaiton preemption (such as priority-based (YARN-1963) /
> fairness-based (YARN-3319)).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)