MENG DING commented on YARN-4138:

Hi, [~sunilg], 

The following is a summary of my discussion with [~leftnoteasy] on this issue 
so far (from YARN-1651) to keep you up-to-date. Let me know if you have any 
questions or comments.

* We will create a new monitor (e.g., 
ContainerResourceIncreaseAllocationExpirer) to track token expiration for 
container resource increase. It must remember both container ID, and the *last 
confirmed* resource to revert back to.
* Multiple container resource increase requests for the same container can 
happen in a row. Later increase request will reset the expiration timeout of 
previous requests for the same container in the monitor.
* When a container increase token expires, not only must RM revert back the 
resource, it also needs to send a decrease container event to NM to make sure 
that the resource allocation is consistent between RM and NM. This is 
particularly important in the situation where multiple container resource 
increase tokens are acquired by AM, but AM doesn't use the latest token to 
initiate the increase action on NM.

> Roll back container resource allocation after resource increase token expires
> -----------------------------------------------------------------------------
>                 Key: YARN-4138
>                 URL: https://issues.apache.org/jira/browse/YARN-4138
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, nodemanager, resourcemanager
>            Reporter: MENG DING
>            Assignee: Sunil G
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.

This message was sent by Atlassian JIRA

Reply via email to