[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221990#comment-14221990
 ] 

Konstantinos Karanasos commented on YARN-2877:
----------------------------------------------

[~Sujeet Varakhedi], also the Apollo paper (OSDI 2014) has interesting ideas 
about distributed scheduling.

[~wangda], glad you like the idea and thanks for the interesting points. To 
answer your questions:
1. Apart from the limit that the LocalRM can impose in the number of queueable 
containers that each AM can receive (for which the central RM does not need to 
be involved), in the heartbeat response from the RM to the NM, information 
about the status of the other queues of the system will be passed as well. This 
way we will be able to impose global policies (such as capacity) in a 
distributed fashion. BTW this information is also used by the LocalRMs to 
decide in which NMs to queue requests.
2. If no policies need to be imposed, the central RM does not need to know 
anything about the queueable containers that each AM uses. Limits in the number 
of queueable containers per AM can be imposed directly by the LocalRM. However, 
in case fine-grained policies need to be imposed (as mentioned in point (1) 
above, such as number of queueable containers per queue in the capacity 
scheduler), the central RM can receive information about the number of 
queueable containers used by each AM, so that it imposes limits per queue. 
Clearly, the more information you pass to the central RM, the more powerful 
policies you can impose, but also the bigger the load you push to the central 
RM. So, there is a sweet-spot there based on the needs of each cluster.
3. This is a good point as well. Such information can be piggybacked in the 
heartbeats to the central RM (again, with the tradeoffs discussed above).


> Extend YARN to support distributed scheduling
> ---------------------------------------------
>
>                 Key: YARN-2877
>                 URL: https://issues.apache.org/jira/browse/YARN-2877
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to