[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2016-02-08 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4138:

Attachment: YARN-4138.7.patch

Fix the failed unit test case 
{{testDecreaseAfterIncreaseWithAllocationExpiration}}.

Attaching the latest patch.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, 
> YARN-4138.5.patch, YARN-4138.6.patch, YARN-4138.7.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2016-02-05 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4138:

Attachment: YARN-4138.6.patch

Attaching a new patch that updates {{lastConfirmedResource}} based on NM 
reported increased containers.

Since the {{containerIncreasedOnNode}} function updates 
{{lastConfirmedResource}}, we will guard the content with a queue lock, but 
will drop the cs lock (this is consistent with other functions like 
{{rollbackContainerResource}}, {{updateIncreaseRequests}} and 
{{decreaseContainer}}).

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, 
> YARN-4138.5.patch, YARN-4138.6.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2016-02-03 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4138:

Attachment: YARN-4138.5.patch

Hi, [~jianhe]

bq. After step 6, rmContainer.getLastConfirmedResource() will return 3G, when 
the expire event gets triggered, won't it reset it back to 3G?

No, it won't reset it back to 3G. rmContainer.getLastConfirmedResource() will 
not return 3G after step 6, it is still 1G. We only confirm resource when NM 
reported resource is the same as RM resource. In this test case, NM reported 
resource is 3G, but RM allocated resource is 6G, so 3G is NOT confirmed. This 
issues was discussed in this thread a while ago: 
https://issues.apache.org/jira/browse/YARN-4138?focusedCommentId=14737229=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737229

bq. I think RMContainerImpl will not receive EXPIRE event at RUNNING state 
after this patch ? if so, we can remove this.

You are right, we can remove this. Attaching the latest patch that remove this.


> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, 
> YARN-4138.5.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2016-02-02 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4138:

Attachment: YARN-4138.4.patch

Attaching new patch now that YARN-4519 is completed.

In {{rollbackContainerResource}} function, we will grab queue lock first, 
calculate the delta resource, and then call {{decreaseContainer}}.  There is no 
need to grab the cs lock.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-18 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4138:

Attachment: YARN-4138.3.patch

Attach latest patch that addresses [~jianhe] and [~sandflee]'s comments.

I think the issue brought up by [~jianhe] is about race conditions between a 
normal resource decrease and a resource rollback. The proposed fix is to guard 
resource rollback with the same sequence of locks as the normal resource 
decrease, i.e., lock on application first, then on scheduler.

So with the proposed fix, we can walk through the original example:
1. AM asks increase 2G -> 8G, and is approved by RM
2. AM does not increase the container, AM asks to decrease to 1G, and in the 
same time, increase expiration logic is triggered:
* If the normal decrease is processed first: RM decrease 8G -> 1G (allocated 
and lastConfirmed are now set to 1G), and then rollback is processed: RM 
rollback 1G -> 1G (skip)
* If rollback is processed first: RM rollback 8G -> 2G (allocated and 
lastConfirmed are now set to 2G), and then normal decrease is processed: RM 
decrease 2G -> 1G


> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-09-18 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4138:

Attachment: YARN-4138-YARN-1197.2.patch

Submit an updated patch that includes extensive test cases.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-09-15 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4138:

Attachment: YARN-4138-YARN-1197.1.patch

Thank you very much [~sunilg].

Attaching a WIP patch. Still need to add test cases.

Some considerations:
* I think there is no need to create a new 
{{ContainerResourceIncreaseAllocationExpirer}}. We can just reuse the existing 
{{ContainerAllocationExpirer}}, and change the type parameter. See below.
* Propose a new type parameter {{AllocationExpirationInfo}} which wraps the 
containerId, and a boolean value to indicate if this is for increase expiration.
* Modify {{ContainerExpiredSchedulerEvent}} to add a boolean field to indicate 
if this event is for increase expiration.
* Add RMContainerImpl.lastConfirmedResource to track the resource to rollback 
to when increase token expires
* When Scheduler receives the CONTAINER_EXPIRED event for container resource 
increase, it calls the existing {{decreaseContainer}} to rollback resources.

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-09-15 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4138:
--
Assignee: MENG DING  (was: Sunil G)

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)