Carlo Curino commented on YARN-2877:

I am going to echo [~kkaranasos] regarding "malicious" AMs. 

The key architectural change we propose is to introduce a proxy layer 
(YARN-2884). This is giving us a "place" that is both distributed, but part of 
the infrastructure (thus inherently trusted) where to enact policies. 
This is where we host the LocalRM functionality of YARN-2885. With this in 
place we do not have to depend on the trusting the AM regarding distributed 
decisions (the AM only exposes need for containers of different type). 
On the contrary, we can enable a broad spectrum of infrastructure-level 
policies, that can leverage explicit or implicit information to impose caps, or 
to balance (or skew) where the queuable containers should be allocated etc.

As we have done in the past, we are working towards providing rather *general 
purpose mechanisms*, and propose a *first set of policies* (AM, LocalRM, NM 
start/stop of containers). Policies can be evolved/overridden 
easily depending on use-cases, while mechanisms are a little harder to change. 
To this end, discussing carefully "other" use cases, such as the conversation 
around using queuable containers for Impala, is very important, 
as we might have missed "hooks" as part of the mechanisms, that are necessary 
to support those scenarios.

> Extend YARN to support distributed scheduling
> ---------------------------------------------
>                 Key: YARN-2877
>                 URL: https://issues.apache.org/jira/browse/YARN-2877
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).

This message was sent by Atlassian JIRA

Reply via email to