[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219871#comment-14219871
 ] 

Carlo Curino commented on YARN-2877:
------------------------------------

Karthik, you are correct... 

Karthik, glad you like the idea, and you ask good questions...  

This could be relevant to lower the load on the central RM (hence help with 
scale), in particular if we have a vast number of short-lived tasks (heavy 
scheduling cost for little work). 
(However, we have other ongoing work towards that, which we will post soon, 
hence the focus on utilization)

What takes care of the "fast adaption" to node conditions is having a local 
queue (from which to pick more work if I am idle), and the notion of different 
containers types (i.e., I can kick out the optimistic containers if I am 
overbooked).
With this in mind, the RM could be the one making scheduling decisions for 
queueable/optimistic containers as well, as you pointed out.

What is constant (whether you make the scheduling decisions centrally or 
distributed), is the notion of different container types (see YARN-2882). 
This should be exposed to the AM, as it comes with very different level of 
guarantees on the container start/completion. 
Thus the AM need to know which type of containers to use for different tasks 
(e.g., short lived or non-critical-path containers can be optimistic).  


> Extend YARN to support distributed scheduling
> ---------------------------------------------
>
>                 Key: YARN-2877
>                 URL: https://issues.apache.org/jira/browse/YARN-2877
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to