[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221990#comment-14221990 ]
Konstantinos Karanasos commented on YARN-2877: ---------------------------------------------- [~Sujeet Varakhedi], also the Apollo paper (OSDI 2014) has interesting ideas about distributed scheduling. [~wangda], glad you like the idea and thanks for the interesting points. To answer your questions: 1. Apart from the limit that the LocalRM can impose in the number of queueable containers that each AM can receive (for which the central RM does not need to be involved), in the heartbeat response from the RM to the NM, information about the status of the other queues of the system will be passed as well. This way we will be able to impose global policies (such as capacity) in a distributed fashion. BTW this information is also used by the LocalRMs to decide in which NMs to queue requests. 2. If no policies need to be imposed, the central RM does not need to know anything about the queueable containers that each AM uses. Limits in the number of queueable containers per AM can be imposed directly by the LocalRM. However, in case fine-grained policies need to be imposed (as mentioned in point (1) above, such as number of queueable containers per queue in the capacity scheduler), the central RM can receive information about the number of queueable containers used by each AM, so that it imposes limits per queue. Clearly, the more information you pass to the central RM, the more powerful policies you can impose, but also the bigger the load you push to the central RM. So, there is a sweet-spot there based on the needs of each cluster. 3. This is a good point as well. Such information can be piggybacked in the heartbeats to the central RM (again, with the tradeoffs discussed above). > Extend YARN to support distributed scheduling > --------------------------------------------- > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager > Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)