Sriram Rao commented on YARN-2877:

[~leftnoteasy] By definition, the allocation decisions made by the central RM 
win out.  That is, whenever there is a conflict, *guaranteed-start* (or 
CONSERVATIVE) containers will be executed prior to *queueable* (or OPTIMISTIC) 
containers.  This could also means that the NM may be forced to preempt running 
*queueable* containers to make room.   Lastly, to allow some level of 
predictability in terms of execution time for *queueable* containers, we could 
use leases---a *queueable* container is allowed to execute for at most N secs 
even when there is conflict and if the container hasn't exited, the NM will 
preempt them after that time interval elapses (i.e., lease expires).  This 
mechanism can allow minimizing preemption for *queueable* containers.

Re: your other questions:
# Capacity is enforced for *guaranteed-start* containers.  For *queueable* 
containers, policies could be pushed down from central-RM 
# It is not necessary that the *queueable* containers factor into central RM's 
allocation choices.  That said, having that information at the central-RM can 
help minimize preemption.
# For enabling load balancing of queues at the NM's 
(([YARN-2888|https://issues.apache.jira/browse/YARN-2888]), allow AM's to make 
choices of where to submit *queueable* containers 
([YARN-2887|https://issues.apache.jira/browse/YARN-2887]), exposing queue 
information to local-RM's is desirable.

> Extend YARN to support distributed scheduling
> ---------------------------------------------
>                 Key: YARN-2877
>                 URL: https://issues.apache.org/jira/browse/YARN-2877
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).

This message was sent by Atlassian JIRA

Reply via email to