[
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083957#comment-15083957
]
Bikas Saha commented on YARN-1011:
----------------------------------
I agree with natural container churn in favor of preemption to avoid lost work
though the issue of clearly defining scheduler policy still remains.
bq. If we were oversubscribing 10X then I'd probably want it for sure, but if
it's at most 2X capacity then worst case is a container only gets 50% of the
resource it had requested. Obviously for something like memory this has to be
closely controlled because going over the physical capabilities of the machine
has very significant consequences. But for CPU, I'd definitely be inclined to
live with the occasional 50% worst case for all containers, in order to avoid
the 1/1024th worst case for OPPORTUNISTIC containers on a busy node.
I did not understand this. Does this mean, its ok for normal containers to run
50% slower in the presence of opportunistic containers? If yes, then there are
scenarios where this may not be a valid choice. E.g. when a cluster is running
a mix of SLA and non-SLA jobs. Non-SLA jobs are ok if there containers got
slowed down to increase cluster utilization by running opportunistic containers
because we are getting higher overall throughput. But SLA jobs are not ok with
missing deadlines because there tasks ran 50% slower.
IMO, the litmus test for a feature like this would be to take an existing
cluster (with low utilization because tasks are asking for more resources than
what they need 100% of the time). Then turn this feature on and get better
cluster utilization and throughput without affecting the existing workload.
Whatever be the internal implementation details. Agree?
bq. 50% of maximum-under-utilized resource of past 30 min for each NM can be
used to allocate opportunistic containers.
These are heuristics and may all be valid under different circumstances. What
we should step back and see is what is the source of this optimization.
Observation : Cluster is under-utilized despite being fully allocated
Possible reasons :
1) Tasks are incorrectly over-allocated. Will never use the resources they ask
for and hence we can safely run additional opportunistic containers. So this
feature is used to compensate for poorly configured applications. Probably a
valid scenario but is it common?
2) Tasks are correctly allocated but dont use their capacity to the limit all
the time. E.g. Terasort will use high cpu only during the sorting but not
during the entire length of the job. But its containers will ask for enough CPU
to run the sort in the desired time. This is a typical application behavior
where resource usage varies over time. So this feature is used to soak up the
fallow resources in the cluster while tasks are not using their quoted capacity.
The arguments and assumptions we make need to be considered in the light of
which of 1 or 2 is the common case and where this feature will be useful.
While its useful to have configuration knobs, for a complex dynamic feature
like this that is basically reacting to runtime observations, it may be quite
hard to be able to configure this statically using manual configuration. While
some limits about max over-allocation limit etc. are easy and probably required
to configure, we should look at making this feature work by itself instead of
relying exclusively on configuration (hell :P) for users to make this feature
usable.
> [Umbrella] Schedule containers based on utilization of currently allocated
> containers
> -------------------------------------------------------------------------------------
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated
> containers and, if appropriate, allocate more (speculative?) containers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)