[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-1197:
----------------------------
    Attachment: YARN-1197_Design.pdf

We have come up with a new proposal to address the above problem, which we 
believe makes more sense. 

In essence, when RM receives a valid resource decrease message for a container, 
it should go ahead and honor it directly. If later on an increase message comes 
for the same container, and the target resource allocation is different than 
the current resource allocation known by RM, this increase action should be 
cancelled. To cancel this increase action, RM can simply use the NM-RM node 
heartbeat response to send the current resource allocation of this container 
known by RM back to NM. NM can use this value to monitor and enforce the 
resource usage of the container. In fact, we propose that after RM processes 
resource change messages from NM after each node update heartbeat, it should 
simply set the final resource allocation of those containers that have just 
been re-sized in the heartbeat response, so that NM will have the same view of 
the resource allocation of those changed containers as RM.

Back to the original example:

1. A container is currently using 6G
2. AM asks RM to increase it to 8G
3. RM grants the increase request, allocates the resource to the container to 
8G, and issues a token to AM. It starts a timer and remembers the original 
resource allocation before the increase as 6G.
4. AM, instead of initiating the resource increase to NM, requests a resource 
decrease to NM to decrease it to 4G
5. The decrease is successful and RM gets the notification, and updates the 
container resource to 4G
6. Before the token expires, the AM requests the resource increase to NM
7. RM receives the resource increase message (8G) from node update. However, 
the current resource allocation of this container is 4G, which is different 
than 8G, RM will NOT consider this increase as valid. It unregister the 
increase request from the timer, and sets the current resource allocation (4G) 
to the node heartbeat response, which will be pulled by NM in the next 
heartbeat cycle. 
8. Once NM receives the 4G resource allocation, it will monitor and enforce 
using the 4G value.

We have finished updating the design doc and have attached it to this thread 
(YARN-1197_Design.pdf). Many thanks to [~wangda] for the original design doc, 
which really helped us to catch up with all the discussion as quickly as we 
can. We hope to get your valuable feedback soon. 

We think most of the sub-tasks are still good as outlined in the updated design 
doc. Once we get approval of the design from the community, we will start the 
implementation. 

> Support changing resources of an allocated container
> ----------------------------------------------------
>
>                 Key: YARN-1197
>                 URL: https://issues.apache.org/jira/browse/YARN-1197
>             Project: Hadoop YARN
>          Issue Type: Task
>          Components: api, nodemanager, resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Wangda Tan
>         Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
> tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
> yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to