[
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632571#comment-13632571
]
Arun C Murthy commented on YARN-45:
-----------------------------------
Sorry, I've been away for a couple of weeks due to family reasons and I'm just
catching up.
The bare-minimum requirement seems:
# RM should notify the AM that a certain amount of resources will need to be
reclaimed (ala SIGTERM).
# Thus, the AM gets an opportunity to *pick* which containers it will sacrifice
to satisfy the RM's requirements.
# Iff the AM doesn't act, the RM will go ahead and terminate some containers
(probably the most-recently allocated ones); ala SIGKILL.
Given the above, I feel that this is a set of changes we need to be
conservative about - particularly since the really simple pre-emption i.e.
SIGKILL alone on RM side is trivial (from an API perspective).
Thus, I'm concerned about jumping into a complex preemption API
(ResourceRequest etc.) without having sufficient experience i.e. doing this in
the first iteration itself.
I like [~tucu00]'s initial suggestion of:
# Resource resourcesToReclaim
# Optionally, a Set<ContainerId> which the RM will preempt i.e. SIGKILL
In fact, for the first iteration, Set<ContainerId> is something we can avoid if
the semantics are clear i.e. RM will preempt the most-recently allocated
containers.
Once we have sufficient experience with this, we can then dive deeper to think
about further enhancements to the API by adding features (in a compatible
manner for 2.x or 3.x).
Thoughts?
> Scheduler feedback to AM to release containers
> ----------------------------------------------
>
> Key: YARN-45
> URL: https://issues.apache.org/jira/browse/YARN-45
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Reporter: Chris Douglas
> Assignee: Carlo Curino
> Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict
> enforcement of resource invariants in the cluster. Individual allocations of
> containers must be reclaimed- or reserved- to restore the global invariants
> when cluster load shifts. In some cases, the ApplicationMaster can respond to
> fluctuations in resource availability without losing the work already
> completed by that task (MAPREDUCE-4584). Supplying it with this information
> would be helpful for overall cluster utilization [1]. To this end, we want to
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira