[
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072279#comment-15072279
]
Bikas Saha commented on YARN-1011:
----------------------------------
In my prior experience, something like this is not practical without pro-active
cpu management (which has been delegated to future work in the document). It is
essential to run opportunistic tasks at lower OS cpu priority so that they
never obstruct progress of normal tasks. Typically we will find that the
machine is under-allocated the most in cpu usage since most processing has
bursty cpu. When a normal task has a cpu burst then it should have to contend
with an opportunistic task since this will be detrimental to the expected
performance of that task. Without this, jobs will not run predictably in the
cluster. From what I have seen, users prefer predictability over most other
things. ie. having a 1 min job run in 1 min all the time vs making that job run
in 30s 85% of the time and but in 2 mins for 5% of the time because that makes
it really hard to establish SLAs. In fact, this is the litmus test for
opportunistic scheduling. It should be able to raise the utilization of a
cluster from (say 50%) to (say 75%) without affecting the latency of the jobs
compared to when the cluster was running at 50%.
For memory, in fact, its ok to share and reach 100% capacity but its important
to check that the machine does not start thrashing. Most well written tasks
will run within their memory limits and start spilling etc. Opportunistic tasks
are trying to occupy the memory that these tasks thought they could use but are
not using (or that these tasks are keeping in buffer on purpose). The crucial
thing to consider here is to look for stats that signify the onset of memory
paging activity (or overall memory over-subscription at the OS level). At that
point, even normal tasks that are within their limit will be adversely affected
because the OS will start paging memory to disk. So we need to start
proactively killing opportunistic tasks before the such paging activity gets
triggered.
Handling opportunistic tasks raises questions on the involvement of the AMs.
Unless I missed something, this is not called out clearly in the doc. In that
sense it would be instructive to consider opportunistic scheduling in a similar
light as preemption. App got container that it should not have gotten at that
time if we had been strict but got it because we decided to loosen the strings
(of queue capacity or machine capacity resp).
- will opportunistic containers be given only when for containers that are
beyond queue capacity such that we dont break any guarantees on their
liveliness. ie. an AM will not expect to lose any container that is within its
queue capacity but opportunistic containers can be killed at any time.
- does the AM need to know that a newly allocated container was opportunistic.
E.g. so that it does not schedule the highest priority work on that container.
- will conversion of opportunistic containers to regular containers be
automatically done by the RM? Will the RM notify the AM about such conversions?
- when terminating opportunistic containers will the RM ask the AM about which
containers to kill? Given the above perf related scenarios this may not be a
viable option.
> [Umbrella] Schedule containers based on utilization of currently allocated
> containers
> -------------------------------------------------------------------------------------
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated
> containers and, if appropriate, allocate more (speculative?) containers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)