Arun Suresh commented on YARN-2885:

Thanks for the review [~leftnoteasy]..

I understand your concerns with regard to the DistributedSchedulerProtocol and 
AMProtocol. But the reason why we decided to club both is to Reduce network 
traffic. Our initial thought was to club the topK nodes and the policy 
information with the Node heartbeat response, but given that we expect number 
of running application to be less than the number of nodes, the AMProtocol's 
register and allocate methods seemed a better choice.

bq. DistSchedRegisterResponse could be considered as regular heartbeat response 
between LocalRM and centralRM, but now it only gets invoked when an application 
master registers to NM. Same as DistSchedAllocateResponse.
An out-of-band heartbeat between components is a good idea, but like I 
mentioned, we wanted to reduce network traffic, and if you think about it, the 
information in the DistSchedRegister/AllocateResponse is required only when a 
scheduling decision is to be made.. which equates to the register and allocate 
Another thing is, This would require another RPC server running on the RM (and 
thus we would need to have yet another address configured).

bq. ..DistributedSchedulingProtocol is in yarn.api, it is still visible from 
user's perspective.
Agreed.. will move it to hadoop-yarn-server-common/proto

Thoughts ?

> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --------------------------------------------------------------------------------------------------
>                 Key: YARN-2885
>                 URL: https://issues.apache.org/jira/browse/YARN-2885
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Konstantinos Karanasos
>            Assignee: Arun Suresh
>         Attachments: YARN-2885-yarn-2877.001.patch, 
> YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, 
> YARN-2885-yarn-2877.full.patch, YARN-2885_api_changes.patch
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.

This message was sent by Atlassian JIRA

Reply via email to