[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072279#comment-15072279
 ] 

Bikas Saha commented on YARN-1011:
----------------------------------

In my prior experience, something like this is not practical without pro-active 
cpu management (which has been delegated to future work in the document). It is 
essential to run opportunistic tasks at lower OS cpu priority so that they 
never obstruct progress of normal tasks. Typically we will find that the 
machine is under-allocated the most in cpu usage since most processing has 
bursty cpu. When a normal task has a cpu burst then it should have to contend 
with an opportunistic task since this will be detrimental to the expected 
performance of that task. Without this, jobs will not run predictably in the 
cluster. From what I have seen, users prefer predictability over most other 
things. ie. having a 1 min job run in 1 min all the time vs making that job run 
in 30s 85% of the time and but in 2 mins for 5% of the time because that makes 
it really hard to establish SLAs. In fact, this is the litmus test for 
opportunistic scheduling. It should be able to raise the utilization of a 
cluster from (say 50%) to (say 75%) without affecting the latency of the jobs 
compared to when the cluster was running at 50%.

For memory, in fact, its ok to share and reach 100% capacity but its important 
to check that the machine does not start thrashing. Most well written tasks 
will run within their memory limits and start spilling etc. Opportunistic tasks 
are trying to occupy the memory that these tasks thought they could use but are 
not using (or that these tasks are keeping in buffer on purpose). The crucial 
thing to consider here is to look for stats that signify the onset of memory 
paging activity (or overall memory over-subscription at the OS level). At that 
point, even normal tasks that are within their limit will be adversely affected 
because the OS will start paging memory to disk. So we need to start 
proactively killing opportunistic tasks before the such paging activity gets 
triggered.

Handling opportunistic tasks raises questions on the involvement of the AMs. 
Unless I missed something, this is not called out clearly in the doc. In that 
sense it would be instructive to consider opportunistic scheduling in a similar 
light as preemption. App got container that it should not have gotten at that 
time if we had been strict but got it because we decided to loosen the strings 
(of queue capacity or machine capacity resp).
- will opportunistic containers be given only when for containers that are 
beyond queue capacity such that we dont break any guarantees on their 
liveliness. ie. an AM will not expect to lose any container that is within its 
queue capacity but opportunistic containers can be killed at any time.
- does the AM need to know that a newly allocated container was opportunistic. 
E.g. so that it does not schedule the highest priority work on that container. 
- will conversion of opportunistic containers to regular containers be 
automatically done by the RM? Will the RM notify the AM about such conversions?
- when terminating opportunistic containers will the RM ask the AM about which 
containers to kill? Given the above perf related scenarios this may not be a 
viable option.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to