[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-1011:
-----------------------------------------
    Attachment: patch-for-yarn-1011.patch

Hi guys, as I mentioned to [~kasha] in an offline discussion we had also with 
[~asuresh], I recently did some very similar changes to the RM and the 
CapacityScheduler, which could be of use here.

We implemented a system, which we call Yaq (will be presented in EuroSys in a 
couple of months, so I will soon be able to share the paper as well), that 
allows queuing of tasks at the NMs (using some bits from YARN-2883). If you 
consider queuing as a way of over-committing resources, the setting has many 
commonalities with the current JIRA.

The basic idea is that each NM advertises a number of queue slots (i.e., 
containers that are allowed to be queued at the NM), in order to have some 
tasks ready to be started immediately once resources become available. This way 
we can mask the allocation latency when waiting for the RM to assign and send 
new tasks to the NM.
Then the central scheduler (the CapacityScheduler in our case), performs 
placement in the following way:
# When there are still available resources (I hard-coded it to be <95% 
utilization, Karthik mentioned 80% above), the scheduling is heartbeat-driven, 
like it is today.
# Above 95% utilization, there is an additional asynchronous thread that orders 
the nodes (in the current implementation based on the expected wait time of 
each node) and then starts filling up the queue slots. The reason for having 
this thread is that (1) we don't want a container to be given a queue slot, if 
there is a run slot available, and (2) we want to have the global view of all 
nodes so that we favor nodes that have queues that are less loaded.

As you can see, the scheduling logic is very similar to what [~kasha] described 
above, so I think we can reuse those bits.

What is also similar to the over-commitment that is described in this JIRA, is 
that we extended the {{SchedulerNode}} so that it can account for two types of 
resources, namely run slots (which were already there) and queue slots.
Although in the current implementation we do not over-commit resources (queued 
tasks start only after allocated resources become available), this just has to 
do with how the NM decides to treat the additional tasks it receives (and can 
be easily changed to actually over-commit resources). As far as the 
RM/scheduler is concerned, the logic is the same: you have the allocated 
resources (run slots) and then you have some additional resources that are 
advertised by the nodes and can be used either for queuing containers (in Yaq's 
case) or for starting additional tasks based on the actual resource utilization 
(in the present JIRA's case). So, for the RM, the logic should be the same.

I am attaching an initial patch that contains the classes that are related to 
the RM. It is against branch-2.7.1. Since I left out many class that were 
specific to YARN, the patch will not compile, but you should be able to see all 
the changes I did to the {{SchedulerNode}}, the {{CapacityScheduler}}, the 
{{*Queue}} classes, etc. I would be happy to share more code if needed.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-1011
>                 URL: https://issues.apache.org/jira/browse/YARN-1011
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>         Attachments: patch-for-yarn-1011.patch, yarn-1011-design-v0.pdf, 
> yarn-1011-design-v1.pdf, yarn-1011-design-v2.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to