[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086325#comment-15086325
 ] 

Bikas Saha commented on YARN-1011:
----------------------------------

bq. At this point what happens to the opportunistic container. It is clearly 
running at lower priority on the node and as such we are not giving the job its 
guaranteed capacity.
bq. At this point what happens to the opportunistic container. It is clearly 
running at lower priority on the node and as such we are not giving the job its 
guaranteed capacity.
Yes. the difference is that the opportunistic container may not be convertible 
into a normal container because that node is still over-allocated. So at this 
point, what should be done? Should this container be terminated and run 
somewhere else as normal (because capacity is now available)? Should some other 
container be preempted on this node to make this container normal? Should the 
RM allocate a normal container and give it to the app in addition to the 
running opportunistic container in case the app can do the transfer internally?

Also, with this feature in place, should we run all containers beyond 
guaranteed capacity as opportunistic containers? This would ensure that any 
excess containers that we give to a job will not affect performance of the 
guaranteed containers of other jobs. This would also make the scheduling and 
allocation more consistent in that the guaranteed containers always run at 
normal priority and extra containers run at lower priority. The extra container 
could be extra over capacity (but without over-subscription) or extra 
over-subscription. Because of this I feel that running tasks at lower priority 
could be an independent (but related) work item.

Staying on this topic and addition configuration to it. It may make sense to 
add some way by which an application can say that dont oversubscribe nodes when 
my containers run on it. Putting cgroups or docker in this context, would these 
mechanism support over-allocating resources like cpu or memory?

bq. When space frees up on nodes, NMs send candidate containers for promotion 
on the heartbeat.
That shouldn't be necessary since the RM will get to know about free capacity 
and run its scheduling cycle for that node - at which point it will be able to 
take action like allocation a new container or upgrading an existing one. There 
isnt anything the NM can tell the RM (which the RM already does not know) 
except for the current utilization of the node.

Some of what I am saying emanates from prior experience with a different Hadoop 
like system. You can read more about it here. 
http://research.microsoft.com/pubs/232978/osdi14-paper-boutin.pdf


> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: yarn-1011-design-v0.pdf, yarn-1011-design-v1.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to