[jira] [Comment Edited] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2018-09-20 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622450#comment-16622450
 ] 

Arun Suresh edited comment on YARN-1011 at 9/20/18 6:11 PM:


Planning on spending more cycles on this now.
Looking at the SubTasks, it looks like some of them are already committed to 
trunk - mostly the ones pertaining to ResourceUtilization plumbing and NM 
CGroups based improvements.. Wondering if it is ok to move those into another 
umbrella JIRA ?


was (Author: asuresh):
Planning on spending more cycles on this now.
Looking at the SubTasks, it looks like some of them are already committed - 
mostly the ones pertaining to ResourceUtilization plumbing and NM CGroups based 
improvements.. Wondering if it is ok to move those into another umbrella JIRA ?

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
>Assignee: Karthik Kambatla
>Priority: Major
> Attachments: patch-for-yarn-1011.patch, yarn-1011-design-v0.pdf, 
> yarn-1011-design-v1.pdf, yarn-1011-design-v2.pdf, yarn-1011-design-v3.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085738#comment-15085738
 ] 

Karthik Kambatla edited comment on YARN-1011 at 1/6/16 4:12 PM:


bq. Just to make sure I understand. When you say max threshold < 1 are you 
saying an NM could not advertise 48 vcores if there are only 24 vcores 
physically available?

You can continue to advertise more vcores. 

Consider a cluster with nodes of 1 physical core. Let us say each node 
advertises 10 *vcores*. Today, let us say your CPU utilization under these 
settings is 50% running 10 containers. All these containers in this context 
would be GUARANTEED containers. I am proposing we set a max threshold for the 
RM over-allocating containers to 95%.This essentially means, the RM allocates 
OPPORTUNISTIC containers on this node (that has been previously fully 
allocated) until we hit the utilization threshold of 95% - say, running 19 
containers. At this point if one container's usage goes higher taking us beyond 
95%, we kill enough OPPORTUNISTIC containers to bring this under 95%. May be, 
the max allowed threshold could be higher - 99%. I am wary of setting it to 
100% unless we have some other way of differentiating "running comfortably at 
100%" vs "contention at 100%" because both look the same.  Also, I am assuming 
people would be very happy with 95% utilization if we achieve that :)

bq. nodes can be at 100% CPU, 100% Network, or 100% Disk for long periods of 
time (several minutes). Memory could get to something like 80% before 
corrective action would be required. 
I am beginning to see the need for different thresholds for different 
resources. While I wouldn't necessarily shoot for 100, I can see someone 
configuring it to 95% CPU, 85% network (as this could spike significantly with 
shuffle etc.), 90% disk, 80% memory. And, we would stop over-allocating the 
moment we hit *any one* of these thresholds. 

Should we keep it simple to begin with and have one config, and add other 
configs in the future? Or, do you think the config-per-resource should be there 
from the get go? 


was (Author: kasha):
bq, Just to make sure I understand. When you say max threshold < 1 are you 
saying an NM could not advertise 48 vcores if there are only 24 vcores 
physically available?
You can continue to advertise more vcores. 

Consider a cluster with nodes of 1 physical core. Let us say each node 
advertises 10 *vcores*. Today, let us say your CPU utilization under these 
settings is 50% running 10 containers. All these containers in this context 
would be GUARANTEED containers. I am proposing we set a max threshold for the 
RM over-allocating containers to 95%.This essentially means, the RM allocates 
OPPORTUNISTIC containers on this node (that has been previously fully 
allocated) until we hit the utilization threshold of 95% - say, running 19 
containers. At this point if one container's usage goes higher taking us beyond 
95%, we kill enough OPPORTUNISTIC containers to bring this under 95%. May be, 
the max allowed threshold could be higher - 99%. I am wary of setting it to 
100% unless we have some other way of differentiating "running comfortably at 
100%" vs "contention at 100%" because both look the same.  Also, I am assuming 
people would be very happy with 95% utilization if we achieve that :)

bq. nodes can be at 100% CPU, 100% Network, or 100% Disk for long periods of 
time (several minutes). Memory could get to something like 80% before 
corrective action would be required. 
I am beginning to see the need for different thresholds for different 
resources. While I wouldn't necessarily shoot for 100, I can see someone 
configuring it to 95% CPU, 85% network (as this could spike significantly with 
shuffle etc.), 90% disk, 80% memory. And, we would stop over-allocating the 
moment we hit *any one* of these thresholds. 

Should we keep it simple to begin with and have one config, and add other 
configs in the future? Or, do you think the config-per-resource should be there 
from the get go? 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2016-01-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083223#comment-15083223
 ] 

Karthik Kambatla edited comment on YARN-1011 at 1/5/16 3:28 PM:


We would run an opportunistic container on a node only if the actual 
utilization is less than the allocation by a margin bigger than the allocation 
of said opportunistic container. We reactively preempt the opportunistic 
container if the actual utilization goes over a threshold. To address spikes in 
usage where our reactive measures are too slow to kick in, we run the 
opportunistic containers at a strictly lower priority. 

bq. the app got opportunistic containers and their perf wasnt the same as 
normal containers - so it ran slower. 
As soon as we realize the perf is slower because the node has higher usage than 
we had anticipated, we preempt the container and retry allocation (guaranteed 
or opportunistic depending on the new cluster state). So, it shouldn't run 
slower for longer than our monitoring interval. Is this assumption okay? 

bq. However, things get complicated because a node with an opportunistic 
container may continue to run its normal containers while space frees up for 
guaranteed capacity on other nodes.
The opportunistic container will continue to run on this node so long as it is 
getting the resources it needs. If there is any sort of resource contention, it 
is preempted and is up for allocation on one of the free nodes. 

bq. This would require that the system upgrade opportunistic containers in the 
same order as it would allocate containers.
bq. IMO, the NM cannot make a local choice about upgrading its opportunistic 
containers because this is effectively a resource allocation decision and only 
the RM has the info to do that.
The RM schedules the next highest priority "task" for which it couldn't find a 
guaranteed container as an opportunistic container. This task continues to run 
as long as it is not getting enough resources. If there is no resource 
contention, the task continues to run. If guaranteed resources free up on the 
node it is running, isn't it fair to promote the container to Guaranteed. After 
all, if the resources unused were not hidden behind other containers' 
allocation and actually available as guaranteed capacity on that node 
initially, the RM would just have scheduled a guaranteed container in the first 
place.

I should probably clarify that the proposal here targets those cases where 
users' estimates are significantly off reality and there are enough free 
resources per node to run additional task(s) without causing any resource 
contention. Even though this is the norm, we want to guard against spikes in 
usage to avoid perf regressions. In practice, I expect admins to come up with a 
reasonable threshold for over-subscription: e.g. 0.8 - we use only 
oversubscribe upto 80% of capacity advertised through 
{{yarn.nodemanger.resource.*}}. Thinking more about this, this threshold should 
have an upper limit - 0.95? 



was (Author: kasha):
We would run an opportunistic container on a node only if the actual 
utilization is less than the allocation by a margin bigger than the allocation 
of said opportunistic container. We reactively preempt the opportunistic 
container if the actual utilization goes over a threshold. To address spikes in 
usage where our reactive measures are too slow to kick in, we run the 
opportunistic containers at a strictly lower priority. 

bq. the app got opportunistic containers and their perf wasnt the same as 
normal containers - so it ran slower. 
As soon as we realize the perf is slower because the node has higher usage than 
we had anticipated, we preempt the container and retry allocation (guaranteed 
or opportunistic depending on the new cluster state). So, it shouldn't run 
slower for longer than our monitoring interval. Is this assumption okay? 

bq. However, things get complicated because a node with an opportunistic 
container may continue to run its normal containers while space frees up for 
guaranteed capacity on other nodes.
The opportunistic container will continue to run on this node so long as it is 
getting the resources it needs. If there is any sort of resource contention, it 
is preempted and is up for allocation on one of the free nodes. 

bq. This would require that the system upgrade opportunistic containers in the 
same order as it would allocate containers.
bq. IMO, the NM cannot make a local choice about upgrading its opportunistic 
containers because this is effectively a resource allocation decision and only 
the RM has the info to do that.
The RM schedules the next highest priority "task" for which it couldn't find a 
guaranteed container as an opportunistic container. This task continues to run 
as long as it is not getting enough resources. If there is no resource 
contention, the task continues to run. If 

[jira] [Comment Edited] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2015-12-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071335#comment-15071335
 ] 

Karthik Kambatla edited comment on YARN-1011 at 12/25/15 2:45 AM:
--

Just put my thoughts in a design doc here. Appreciate any feedback and 
suggestions on the same.


was (Author: kasha):
Just put my thoughts in a design doc here. Appreciate any feedback and 
suggestions on the same? 

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
> Attachments: yarn-1011-design-v0.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)