Wangda Tan commented on YARN-2877:

Thanks very much for explanation from [~kkaranasos], [~sriramsrao], I reply 

Now I can better understand the use case. Yes, the queueable containers not 
necessarily need to send to central RM. Except we want to add other features 
like queue balancing, etc.

One more question, how AM can know which NM is more idle than others? Since 
simply querying NM status from every NM is not efficient enough.

And I'm thinking the distributed scheduling could be integrated to existing 
scheduler, like Capacity Scheduler. Some other features could be added with 
# For now, we trust AM will make correct opportunistically container launch 
request. But considering a case like a large cluster has only few applications 
use opportunistically launch, others are conservative apps. It is possible an 
AM can "steal" a lot of resource from NMs from cluster by sending 
opportunistical launch request to all NMs. We can have a centralized RM combine 
resource usage of queueable/conservative containers to enforce fairness. And 
put malicious AMs to blacklists.
# As the name of the title (distributed scheduling), we may be able to do more 
than "opportunistically". For example, we can launch a opportunistically 
container in NM, but it is possible to become a consertive container after 
heartbeat to RM if the resource meets capacity settings for each queue.

Thanks in advance!

> Extend YARN to support distributed scheduling
> ---------------------------------------------
>                 Key: YARN-2877
>                 URL: https://issues.apache.org/jira/browse/YARN-2877
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).

This message was sent by Atlassian JIRA

Reply via email to