Karthik Kambatla commented on YARN-1011:
Thanks for chiming in, [~bikassaha].
bq. It is essential to run opportunistic tasks at lower OS cpu priority so that
they never obstruct progress of normal tasks.
bq. In fact, this is the litmus test for opportunistic scheduling.
Good point. Guaranteed containers should get priority for resources:
Opportunistic containers should only use left-over resources. We should do this
for CPU, disk and network. I am not aware of the latest on disk and network
isolation, but we should create sub-tasks for those too. /cc [~vvasudev]
bq. Handling opportunistic tasks raises questions on the involvement of the AMs.
bq. In that sense it would be instructive to consider opportunistic scheduling
in a similar light as preemption.
I wasn't sure the AM needs to know a container's execution type:
As you mention, this is very similar to preemption. From an AM's standpoint,
the container would be preempted if those resources are not available to that
application any more. In case of preemption, this can happen if other high
priority queues have outstanding demand or the cluster lost a couple of nodes.
Here, it is possible Guaranteed containers actually need the resources. In
that sense, the AM doesn't have to do anything different for Guaranteed vs
Predictability: Allowing applications to specify only Guaranteed containers vs
Guaranteed or Opportunistic containers should take care of this. However,
between getting no resources and getting opportunistic resources, are there
cases where the applications prefer the latter? The applications "should" get
guaranteed containers at the same point in time irrespective of whether they
use opportunistic resources in the interim. Note that allowing applications to
specify whether they are okay with getting opportunistic containers complicates
the scheduling - the scheduler needs to look through the higher priority apps
that don't allow opportunistic containers before getting to those that need.
And, when resources are available on that node, the RM will need to schedule
containers for higher priority apps prolonging the duration for which
opportunistic containers stay opportunistic.
Given this complication, I would prefer we do not involve AMs in the
decision-making process. Based on the need and usecases, we could revisit this
at a later time. Note that YARN-4335 adds this to ResourceRequest for
distributed scheduling, and even there they are not entirely sure if it needs
to be a part.
bq. does the AM need to know that a newly allocated container was
opportunistic. E.g. so that it does not schedule the highest priority work on
Valid concern. May be, we should intimate the AM of whether a container is
opportunistic, and later when it gets promoted to guaranteed. That said, I am
not sure if this is essential to oversubscription being useful. Thoughts on
punting it to Phase-2?
bq. will opportunistic containers be given only when for containers that are
beyond queue capacity such that we dont break any guarantees on their
liveliness. ie. an AM will not expect to lose any container that is within its
queue capacity but opportunistic containers can be killed at any time.
Yes. This probably needs to be clear in the doc. Will update it.
bq. will conversion of opportunistic containers to regular containers be
automatically done by the RM?
By some combination of RM/NM, definitely yes. Initially, I thought the RM can
be the only one doing this. The RM could keep track of opportunistic containers
in SchedulerNode. Today, we already track launchedContainers. The scheduler
could go through this list and promote containers before allocating new
Does this add an unnecessary delay in the promotion though? If the scheduler
allocated opportunistic containers based on the same prioritization it uses for
guaranteed containers, can the NM just promote the oldest opportunistic
container running on that node and update the RM accordingly?
Another thing to consider here: the promotion process here should work with
that in YARN-2877. [~subru], [~kkaranasos], [~asuresh] - is it okay for the NM
to automatically promote some opportunistic containers. May be, we could add a
flag to the launch context to differentiate between those opportunistic
containers that can be automatically promoted vs those that can not be.
> [Umbrella] Schedule containers based on utilization of currently allocated
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
> Currently RM allocates containers and assumes resources allocated are
> RM can, and should, get to a point where it measures utilization of allocated
> containers and, if appropriate, allocate more (speculative?) containers.
This message was sent by Atlassian JIRA