[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221012#comment-14221012 ]
Konstantinos Karanasos commented on YARN-2877: ---------------------------------------------- Adding some more details, now that we have added the first sub-tasks. In YARN-2882 we introduce two *types of containers*: guaranteed-start and queueable. The former are the ones existing in YARN today (are allocated from the central RM, and once allocated, are guaranteed to start). The latter make it possible to queue container requests in the NMs and will be used for distributed scheduling. The *queuing of (queueable) container requests* in the NMs is proposed in YARN-2883. Each NM will now also have a *LocalRM* (Local ResourceManager) that will receive all container requests from the AMs running on the same machine: - For the guaranteed-start container requests, the LocalRM acts as a proxy (YARN-2884), forwarding them to the central RM. - For the queueable container requests, the LocalRM is responsible for sending them directly to the NM queues (bypassing the central RM). Deciding the NMs where these requests are queued is based on the estimated waiting time in the NM queues, as discussed in YARN-2886. Based on some policy (YARN-2887), each AM will determine *what type of containers to ask*: only guaranteed-start, only queueable, or a mix thereof. For instance, an AM may request guaranteed-start containers for its tasks that are expected to be long-running, whereas it may ask for queueable containers for its short tasks (in which the back-and-forth with the central RM may be longer than the task execution time). This way we reduce the scheduling latency, while increasing the utilization of the cluster (if we had to go to the central RM for all these short tasks, some resources of the cluster might remain idle in the meanwhile). To ensure the NM queues remain balanced, we propose *corrective mechanisms for NM queue rebalancing* in YARN-2888. Moreover, to ensure no AM is abusing the system by asking too many queueable containers, we can impose a limit in the *number of queueable containers* that each AM can receive (YARN-2889). > Extend YARN to support distributed scheduling > --------------------------------------------- > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager > Reporter: Sriram Rao > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)