Nathan Roberts commented on YARN-1011:

bq. This is one of the reasons I was proposing the notion of a max threshold 
which is less than 1 If the utilization goes to 100%, we clearly know there is 
contention. Since we measure resource utilization in resource-seconds (if not, 
we should update it), bursty spikes alone wouldn't take utilization over 100%. 
So, we shouldn't see a utilization greater than 100%.

Just to make sure I understand. When you say max threshold < 1 are you saying 
an NM could not advertise 48 vcores if there are only 24 vcores physically 
available? I think we have to support going above 1.0. We already go above 1.0 
on our clusters, even without this feature. What I'm thinking this feature will 
allow us to do is to go significantly above 1.0, especially for resources like 
memory where we have to be much more careful about not hitting 100%. 

One use case that I'm really hoping this feature can support is a batch cluster 
(loose SLAs) with very high utilization. For this use case, I'd like the 
following to be true:
- nodes can be at 100% CPU, 100% Network, or 100% Disk for long periods of time 
(several minutes). Memory could get to something like 80% before corrective 
action would be required. During these periods, no containers get shot to shed 
load. Nodemanagers might reduce their available resource advertised to the RM, 
but nothing would need to be killed.
- Both GUARANTEED and OPPORTUNISTIC containers get their fair share of 
resources. They're both drawing from the same capacity and user-limit from the 
RM's point of view so I feel like they should be given their fair set of 
resources on the nodes they execute on. The real point of being designated 
OPPORTUNISTIC in this use case is that the NM knows which containers to kill 
when it needs to shed load.  

Another use case is where you have a mixture of jobs, some with tight SLAs, 
some with looser SLAs. This one is mentioned in previous comments and is also 
very important. It requires a different set of thresholds and a different level 
of fairness controls. 

So, I just think things have to be configurable enough to handle both types of 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -------------------------------------------------------------------------------------
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.

This message was sent by Atlassian JIRA

Reply via email to