Karthik Kambatla commented on YARN-1011:

Thanks for chiming in, [~bikassaha]. 

bq. It is essential to run opportunistic tasks at lower OS cpu priority so that 
they never obstruct progress of normal tasks.
bq. In fact, this is the litmus test for opportunistic scheduling.
Good point. Guaranteed containers should get priority for resources: 
Opportunistic containers should only use left-over resources. We should do this 
for CPU, disk and network. I am not aware of the latest on disk and network 
isolation, but we should create sub-tasks for those too. /cc [~vvasudev] 

bq. Handling opportunistic tasks raises questions on the involvement of the AMs.
bq. In that sense it would be instructive to consider opportunistic scheduling 
in a similar light as preemption.

I wasn't sure the AM needs to know a container's execution type:

As you mention, this is very similar to preemption. From an AM's standpoint, 
the container would be preempted if those resources are not available to that 
application any more. In case of preemption, this can happen if other high 
priority queues have outstanding demand or the cluster lost a couple of nodes. 
Here, it is possible Guaranteed containers actually need the resources.  In 
that sense, the AM doesn't have to do anything different for Guaranteed vs 
Opportunistic containers.

Predictability: Allowing applications to specify only Guaranteed containers vs 
Guaranteed or Opportunistic containers should take care of this. However, 
between getting no resources and getting opportunistic resources, are there 
cases where the applications prefer the latter? The applications "should" get 
guaranteed containers at the same point in time irrespective of whether they 
use opportunistic resources in the interim. Note that allowing applications to 
specify whether they are okay with getting opportunistic containers complicates 
the scheduling - the scheduler needs to look through the higher priority apps 
that don't allow opportunistic containers before getting to those that need. 
And, when resources are available on that node, the RM will need to schedule 
containers for higher priority apps prolonging the duration for which 
opportunistic containers stay opportunistic. 

Given this complication, I would prefer we do not involve AMs in the 
decision-making process. Based on the need and usecases, we could revisit this 
at a later time. Note that YARN-4335 adds this to ResourceRequest for 
distributed scheduling, and even there they are not entirely sure if it needs 
to be a part. 

bq. does the AM need to know that a newly allocated container was 
opportunistic. E.g. so that it does not schedule the highest priority work on 
that container.
Valid concern. May be, we should intimate the AM of whether a container is 
opportunistic, and later when it gets promoted to guaranteed. That said, I am 
not sure if this is essential to oversubscription being useful. Thoughts on 
punting it to Phase-2? 

bq. will opportunistic containers be given only when for containers that are 
beyond queue capacity such that we dont break any guarantees on their 
liveliness. ie. an AM will not expect to lose any container that is within its 
queue capacity but opportunistic containers can be killed at any time.
Yes. This probably needs to be clear in the doc. Will update it. 

bq. will conversion of opportunistic containers to regular containers be 
automatically done by the RM? 
By some combination of RM/NM, definitely yes. Initially, I thought the RM can 
be the only one doing this. The RM could keep track of opportunistic containers 
in SchedulerNode. Today, we already track launchedContainers. The scheduler 
could go through this list and promote containers before allocating new 

Does this add an unnecessary delay in the promotion though? If the scheduler 
allocated opportunistic containers based on the same prioritization it uses for 
guaranteed containers, can the NM just promote the oldest opportunistic 
container running on that node and update the RM accordingly? 

Another thing to consider here: the promotion process here should work with 
that in YARN-2877. [~subru], [~kkaranasos], [~asuresh] - is it okay for the NM 
to automatically promote some opportunistic containers. May be, we could add a 
flag to the launch context to differentiate between those opportunistic 
containers that can be automatically promoted vs those that can not be. 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -------------------------------------------------------------------------------------
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.

This message was sent by Atlassian JIRA

Reply via email to