[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

Bikas Saha (JIRA) Fri, 12 Apr 2013 16:52:18 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630769#comment-13630769
 ]


Bikas Saha commented on YARN-45:
--------------------------------

I like the idea of the RM giving information to the AM about actions that it 
might take which will affect the AM. However, I am wary of having the action 
taken in different places. eg. the KILL to the containers should come from the 
RM or the AM exclusively but not from both. Otherwise we open ourselves up to 
race conditions, unnecessary kills and complex logic in the RM.

Preemption is something that, IMO the RM needs to do at the very last moment 
when there is no other alternative of resource being freed up. If we decide to 
preempt at time T1 and then actually preempt at time T2 then the cluster 
conditions may have changed between T1 and T2 which may invalidate the 
decisions taken at T1. New resources may have freed up that reduce the number 
of containers to be killed. This sub-optimality is directly proportional to 
length of time between T1 and T2. So ideally we want to keep T1=T2. One can 
argue that things can change after the preemption which may have made the 
preemption unnecessary. So the above argument of T1=T2 is fallacious. However, 
preemption policies are usually based on deadlines such as the allocation of 
queue1 must be met within X seconds. So RM does not have the luxury of waiting 
for X+1 seconds. The best it can do is to wait upto X seconds in the hope that 
things will work out and at X redistribute resources to meet the deficit.

At the same time, I can see that there is an argument that the AM knows best 
how to free up its resources. It will be good to remember that the AM has 
already informed the RM about the importance of all its containers when it made 
the requests at different priorities. So the RM knows the order of importance 
of the containers and the RM also knows the amount of time each container has 
been allocated. Assuming container runtime as a proxy for container work done, 
this data can be used by the RM to preempt in a work preserving manner without 
having to talk to the AM.

Notifying the AM has the usefulness of allowing the AM to take actions that 
preserve work such as checkpointing. However, IMO, the AM should only do 
checkpointing operations but not kill the containers. That should still happen 
at the RM as the very last option at the last moment. If the situation changes 
in the grace period and the containers do not need to be killed then there is 
no point in the AM killing them right now. This also lets us increase the grace 
period to a longer time because checkpointing and preserving work usually means 
persisting data in a stable store and may be slow in practical scenarios.

To summarize, I would propose an API in which the RM tells the AM about exactly 
which containers it might imminently preempt with the contract being that the 
AM could take actions to preserve the work done in those containers. The AM can 
continue to run those containers until the RM actually preempts them if needed. 
If we really think that the choice of containers needs to be made at the AM 
then the AM needs to checkpoint those containers and inform the RM about the 
containers it has chosen. But the final decision to send the kill must be sent 
by the RM.
                
> Scheduler feedback to AM to release containers
> ----------------------------------------------
>
>                 Key: YARN-45
>                 URL: https://issues.apache.org/jira/browse/YARN-45
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Chris Douglas
>            Assignee: Carlo Curino
>         Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

Reply via email to