[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2017-05-08 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001708#comment-16001708
 ] 

Arun Suresh commented on YARN-1197:
---

[~mingma], Even though YARN-6216 renders the feature Scheduler Agnostic, most 
of the unit tests and the testing were done using the CapacityScheduler. It 
would be nice if we had some basic FairScheduler test cases for it as well. 
Maybe we can add them as part of YARN-1655 before closing it.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.2015.06.24.pdf, 
> YARN-1197_Design.2015.07.07.pdf, YARN-1197_Design.2015.08.21.pdf, 
> YARN-1197_Design.pdf, YARN-1197 old-design-docs-patches-for-reference.zip
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2017-05-08 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001676#comment-16001676
 ] 

Ming Ma commented on YARN-1197:
---

Thanks for info [~tdbaker], [~jianhe], [~asuresh], [~kasha]! That means 
branch-2's fair scheduler supports this feature and YARN-1655 can be resolved. 

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.2015.06.24.pdf, 
> YARN-1197_Design.2015.07.07.pdf, YARN-1197_Design.2015.08.21.pdf, 
> YARN-1197_Design.pdf, YARN-1197 old-design-docs-patches-for-reference.zip
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2017-05-08 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001656#comment-16001656
 ] 

Karthik Kambatla commented on YARN-1197:


FairScheduler supports strict locality (relaxLocality == false)

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.2015.06.24.pdf, 
> YARN-1197_Design.2015.07.07.pdf, YARN-1197_Design.2015.08.21.pdf, 
> YARN-1197_Design.pdf, YARN-1197 old-design-docs-patches-for-reference.zip
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2017-04-29 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989922#comment-15989922
 ] 

Arun Suresh commented on YARN-1197:
---

[~mingma], [~jianhe], After YARN-6216, the container resizing feature has been 
made mostly scheduler agnostic. Although, it does require that the Scheduler 
supports requests with relaxLocality = false (which if I remember correctly, 
requires a minor tweek in the FairScheduler).

[~dan...@cloudera.com] / [~kasha], if you can confirm that FairScheduler 
supports requests with relaxLocality = false ?


> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.2015.06.24.pdf, 
> YARN-1197_Design.2015.07.07.pdf, YARN-1197_Design.2015.08.21.pdf, 
> YARN-1197_Design.pdf, YARN-1197 old-design-docs-patches-for-reference.zip
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2017-04-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989772#comment-15989772
 ] 

Jian He commented on YARN-1197:
---

[~mingma], I don't think fair scheduler support this as of now.
It'll be great if framework like Reef can start using it.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.2015.06.24.pdf, 
> YARN-1197_Design.2015.07.07.pdf, YARN-1197_Design.2015.08.21.pdf, 
> YARN-1197_Design.pdf, YARN-1197 old-design-docs-patches-for-reference.zip
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2017-04-28 Thread Tobin Baker (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989685#comment-15989685
 ] 

Tobin Baker commented on YARN-1197:
---

I believe [Apache REEF|http://reef.apache.org/] is interested in this feature.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.2015.06.24.pdf, 
> YARN-1197_Design.2015.07.07.pdf, YARN-1197_Design.2015.08.21.pdf, 
> YARN-1197_Design.pdf, YARN-1197 old-design-docs-patches-for-reference.zip
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2017-04-28 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989682#comment-15989682
 ] 

Ming Ma commented on YARN-1197:
---

Thanks for the feature! Is the fair scheduler support available? Also wonder if 
any framework plans to use the feature.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197_Design.2015.06.24.pdf, 
> YARN-1197_Design.2015.07.07.pdf, YARN-1197_Design.2015.08.21.pdf, 
> YARN-1197_Design.pdf, YARN-1197 old-design-docs-patches-for-reference.zip
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060559#comment-15060559
 ] 

Bikas Saha commented on YARN-1197:
--

The API supports it but the backed implementation does not. So in the future, 
based on need, this could be supported compatibly. Do you have a scenario where 
this is essential?

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061577#comment-15061577
 ] 

sandflee commented on YARN-1197:


seems complicated for AM to do this,  especially we added disk,network to 
container resouces

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061393#comment-15061393
 ] 

MENG DING commented on YARN-1197:
-

[~sandflee], for now you can achieve the goal of increasing and decreasing 
different resource indices by sending separate resource change requests, with 
each request only changing one index.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061347#comment-15061347
 ] 

sandflee commented on YARN-1197:


user application(long running) are running on our yarn platform, they could 
change container resource as they like,  if we forbidden increase one resource 
while decrease another , seems puzzling, but increase/decrease both are the 
most condition.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061348#comment-15061348
 ] 

sandflee commented on YARN-1197:


user application(long running) are running on our yarn platform, they could 
change container resource as they like,  if we forbidden increase one resource 
while decrease another , seems puzzling, but increase/decrease both are the 
most condition.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061349#comment-15061349
 ] 

sandflee commented on YARN-1197:


user application(long running) are running on our yarn platform, they could 
change container resource as they like,  if we forbidden increase one resource 
while decrease another , seems puzzling, but increase/decrease both are the 
most condition.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-16 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061346#comment-15061346
 ] 

sandflee commented on YARN-1197:


user application(long running) are running on our yarn platform, they could 
change container resource as they like,  if we forbidden increase one resource 
while decrease another , seems puzzling, but increase/decrease both are the 
most condition.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059348#comment-15059348
 ] 

Wangda Tan commented on YARN-1197:
--

[~sandflee], it is not supported for now. All increase/decrease must be 
strictly increase/decrease for all resource types. With suggestions from 
[~bikassaha], AMRMClient's api is "changeResource" instead of 
"increase"/"decrease" resource. We could have chance to support arbitrary 
resource changing in the future.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-15 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059332#comment-15059332
 ] 

sandflee commented on YARN-1197:


seems not support  increase memory and decrease cpu cores meanwhile?

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-15 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059362#comment-15059362
 ] 

sandflee commented on YARN-1197:


got it, Thanks,[~leftnoteasy]!

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045443#comment-15045443
 ] 

Wangda Tan commented on YARN-1197:
--

[~sershe],

This will be an alpha feature in Hadoop 2.8.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-12-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045442#comment-15045442
 ] 

Sergey Shelukhin commented on YARN-1197:


Will this feature be usable in YARN/Hadoop 2.8? I see most subtasks are 
resolved but this JIRA is not resolve nor is there a release note.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, graceful, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-09-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905355#comment-14905355
 ] 

Hudson commented on YARN-1197:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #436 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/436/])
YARN-4171. Fix findbugs warnings in YARN-1197 branch. Contributed by Wangda Tan 
(wangda: rev b3f6b641dccb0d59df78855e2951d2cae7dff8ad)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java


> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-09-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905321#comment-14905321
 ] 

Hudson commented on YARN-1197:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8505 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8505/])
YARN-4171. Fix findbugs warnings in YARN-1197 branch. Contributed by Wangda Tan 
(wangda: rev b3f6b641dccb0d59df78855e2951d2cae7dff8ad)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java


> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-09-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905709#comment-14905709
 ] 

Hudson commented on YARN-1197:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2375 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2375/])
YARN-4171. Fix findbugs warnings in YARN-1197 branch. Contributed by Wangda Tan 
(wangda: rev b3f6b641dccb0d59df78855e2951d2cae7dff8ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* hadoop-yarn-project/CHANGES.txt


> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-09-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905758#comment-14905758
 ] 

Hudson commented on YARN-1197:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2348 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2348/])
YARN-4171. Fix findbugs warnings in YARN-1197 branch. Contributed by Wangda Tan 
(wangda: rev b3f6b641dccb0d59df78855e2951d2cae7dff8ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt


> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-09-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905428#comment-14905428
 ] 

Hudson commented on YARN-1197:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #429 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/429/])
YARN-4171. Fix findbugs warnings in YARN-1197 branch. Contributed by Wangda Tan 
(wangda: rev b3f6b641dccb0d59df78855e2951d2cae7dff8ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java


> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-09-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905535#comment-14905535
 ] 

Hudson commented on YARN-1197:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1169 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1169/])
YARN-4171. Fix findbugs warnings in YARN-1197 branch. Contributed by Wangda Tan 
(wangda: rev b3f6b641dccb0d59df78855e2951d2cae7dff8ad)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java


> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-09-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905640#comment-14905640
 ] 

Hudson commented on YARN-1197:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #409 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/409/])
YARN-4171. Fix findbugs warnings in YARN-1197 branch. Contributed by Wangda Tan 
(wangda: rev b3f6b641dccb0d59df78855e2951d2cae7dff8ad)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStatusEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt


> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-09-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744814#comment-14744814
 ] 

Wangda Tan commented on YARN-1197:
--

+1, created YARN-4157 to run Jenkins build for diff before merge.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-09-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744724#comment-14744724
 ] 

Jian He commented on YARN-1197:
---

Thanks [~mding] and [~leftnoteasy] for all the hard work !

Now that the majority of the patches are in, we plan to merge YARN-1197 branch 
into trunk in next few days.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
> Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
> YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
> YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-08-04 Thread dhruv (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653971#comment-14653971
 ] 

dhruv commented on YARN-1197:
-

hi 
can i apply these patches to current stable version 2.7.1 ?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-29 Thread Lei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605807#comment-14605807
 ] 

Lei Guo commented on YARN-1197:
---

[~mding], for the decrease flow via NodeHeartbeatResponseProto to notify NM, 
how we handle the case on network/NM failure?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-29 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605995#comment-14605995
 ] 

MENG DING commented on YARN-1197:
-

One option to consider is to let NM confirm back with RM when it is done 
decreasing the container size. If RM doesn't receive confirmation from NM, it 
will keep sending the decrease message to NM during heartbeat. This is only for 
the purpose of resource enforcement. From scheduling point of view, as soon as 
the decrease request is approved in RM, it takes effect immediately. 

I am not sure if this is worth the effort.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-29 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605932#comment-14605932
 ] 

MENG DING commented on YARN-1197:
-

[~grey], NM will persist the resource decrease in level DB when it receives the 
decrease message, so if it fails and is restarted, it can recover the correct 
container size. In the case of network failure, the decrease message will be 
lost, but it is the same with all other messages in the response (e.g., 
containers to clean up/remove). In practice, I don't think this is a serious 
problem, as we assume by the time a user issues the resource decrease request 
for a container, that container should have already given up the amount of 
resource.

Let me know if you have any thoughts or ideas.
Thanks.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-29 Thread Lei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605997#comment-14605997
 ] 

Lei Guo commented on YARN-1197:
---

Agreed, this is similar to the other cases you mentioned, In this case, we may 
need recommend that AM implementation should check/confirm the decrease status 
after the request. 

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606062#comment-14606062
 ] 

Wangda Tan commented on YARN-1197:
--

I think we can handle container decreasing similar to existing AM releasing 
container for now: NM network failure while decreasing is more like a corner 
case to me, we can add response if it is necessary. And we also need to see if 
AM releasing container needs similar acknowledgement from NM.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-29 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606459#comment-14606459
 ] 

MENG DING commented on YARN-1197:
-

Released containers which are in RUNNING state are put in the 
NodeHeartbeatResponse.containersToCleanup and sent to NM through heartbeat 
response. After NM receives the list, it forcefully kill these containers. I 
don't see a logic in the code right now to acknowledge released containers from 
NM to RM though. In reality, I guess most containers being released by AM will 
be in ACQUIRED state, not RUNNING state.



 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606218#comment-14606218
 ] 

Bikas Saha commented on YARN-1197:
--

There has been a lot of discussion that looks like its converging. It would be 
helpful for the other interested (but not deeply involved) people, if there was 
an updated design document with details about the agreed upon design. Also, if 
this document could outline some of the intuition/logic behind the design 
choices (like going through AM for low latency) then it would super useful. 
Thanks!

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-29 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606243#comment-14606243
 ] 

MENG DING commented on YARN-1197:
-

[~bikassaha], I will update the design doc with detailed intuition/rationale 
behind all the design choices based on the discussion in this thread.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-17 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591055#comment-14591055
 ] 

Sandy Ryza commented on YARN-1197:
--

The latest proposal makes sense to me as well.  Thanks [~wangda] and [~mding]!

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588254#comment-14588254
 ] 

MENG DING commented on YARN-1197:
-

Thanks guys for all the comments! I think we all agreed that container decrease 
request should go through RM, and decrease action will be triggered with RM-NM 
heartbeat.

For increase request and action, theoretically option (a) will have better 
performance. but we are incurring extra complexity for both YARN and 
application writers. I was wondering if we can consider option (c) which sorts 
of meet (a) and (b) in the middle:

1) AM sends increase request to RM
2) RM allocates the resource and sends the increase token to NM.
3) RM sends response to AM right away, instead of waiting for NM to confirm 
that the increase action has been completed.
4) Upon receiving the response (which indicates that the increase has been 
triggered), AM should first poll the container status to make sure that the 
increase is done before taking action to allocate new tasks.

Option (c) will save one NM-RM heartbeat cycle, and since both option (a) and 
(c) need to poll container status, their performance will be very close.

We can have option (b) enabled by default, and use a configuration parameter to 
turn on option (c) for framework like Spark.

Do you think if this is worth considering?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588478#comment-14588478
 ] 

Sandy Ryza commented on YARN-1197:
--

bq. I think this assumes cluster is quite idle, I understand the low latency 
could be achieved, but it's not guaranteed since we don't support 
oversubscribing, etc.
If the cluster is fully contended we certainly won't get this performance.  But 
as long as there is a decent chunk of space, which is common in many settings, 
we can.  The cluster doesn't need to be fully idle by any means.

More broadly, just because YARN is not good at hitting sub-second latencies 
doesn't mean that it isn't a design goal.  I strongly oppose any argument that 
uses the current slowness of YARN as a justification for why we should make 
architectural decisions that could compromise latencies.

That said, I still don't have a strong grasp on the kind of complexity we're 
introducing in the AM, so would like to try to understand that before arguing 
against you further.

Is the main problem we're grappling still the one Meng brought up here:
https://issues.apache.org/jira/browse/YARN-1197?focusedCommentId=14556803page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14556803?
I.e. that an AM can receive an increase from the RM, then issue a decrease to 
the NM, and then use its increase to get resources it doesn't deserve?

Or is the idea that, even if we didn't have this JIRA, NMClient is too 
complicated, and we'd like to reduce that?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588487#comment-14588487
 ] 

Wangda Tan commented on YARN-1197:
--

Thanks [~mding],

I think (c) sounds like a very good proposal, it has advantages
- Latency is better than (a) (If we assume network conditions between 
AM-RM/RM-NM are same, since RM send response to NM at the same heartbeat).
- Doesn't expose container token, etc. to AM when increase approved which is 
not necessary, AM only needs to poll NM about status of changing resource.
- It can be considered as an additional step of (b). ((c) = (b) + 
rm_response_to_am_when_increase_approved + am_poll_nm_about_increase_status). 
Good for planning as well.

bq. We can have option (b) enabled by default, and use a configuration 
parameter to turn on option (c) for framework like Spark.
I think the two can be enabled together, I don't see any conflict between them, 
AM can poll NM if it doesn't want to wait another NM-RM heartbeat.

Thoughts? [~sandyr], [~vinodkv].

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1459#comment-1459
 ] 

Wangda Tan commented on YARN-1197:
--

Thanks for comment, [~sseth]/[~sandyr].

Now I'm convinced, from two downstream developers' view. +1 to do the 
AM-RM-AM-NM (a) for increase as the original doc before (b), not sure if (b) is 
really required, we can do (b) if there's any real use cases.

bq. More broadly, just because YARN is not good at hitting sub-second latencies 
doesn't mean that it isn't a design goal. I strongly oppose any argument that 
uses the current slowness of YARN as a justification for why we should make 
architectural decisions that could compromise latencies.
Make sense to me.

bq. I.e. that an AM can receive an increase from the RM, then issue a decrease 
to the NM, and then use its increase to get resources it doesn't deserve?
Yes, if we send increase request to RM, but send decrease request to NM, we 
need to handle complex inconsistency in RM side. You can take a look at latest 
design doc for more details.

bq. I don't think it's possible for the AM to start using the additional 
allocation till the NM has updated all it's state - including writing out 
recovery information for work preserving restart (Thanks Vinod for pointing 
this out). Seems like that poll/callback will be required - unless the plan is 
to route this information via the RM.
Maybe we need to wait all increase steps (monitor/cgroup/state-store) finish 
before using the additional allocation. If a container is 5G, increase to 10G, 
RM/NM crashes before write to state store, and app starts use 10G. After RM 
restart/recovery, NM/RM will think the container is 5G, that will be 
problematic.

[~mding], do you agree with doing (a)?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588847#comment-14588847
 ] 

Siddharth Seth commented on YARN-1197:
--

bq. I would argue that waiting for an NM-RM heartbeat is much worse than 
waiting for an AM-RM heartbeat. With continuous scheduling, the RM can make 
decisions in millisecond time, and the AM can regulate its heartbeats according 
to the application's needs to get fast responses. If an NM-RM heartbeat is 
involved, the application is at the mercy of the cluster settings, which should 
be in the multi-second range for large clusters.
I tend to agree with Sandy's arguments about option a being better in terms of 
latency - and that we shouldn't be architecting this in a manner which would 
limit it to the seconds range rather than milliseconds / hundreds of 
milliseconds when possible.

It's already possible to get fast allocations - low 100s of milliseconds via a 
scheduler loop which is delinked from NM heartbeats and a variable AM-RM 
heartbeat interval, which is under user control rather than being a cluster 
property.

There are going to be improvements to the performance of various protocols in 
YARN. HADOOP-11552 opens up one such option which allows AMs to know about 
allocations as soon as the scheduler has the made the decision, without a 
requirement to poll. Of-course - there's plenty of work to be done before that 
can actually be used :)

That said, callbacks on the RPC can be applied at various levels - including 
NM-RM communication, which can make option b work fast as well. However, it 
will incur the cost of additional RPC roundtrips. Option a, however, can be 
fast from the get go with tuning, and also gets better with future enhancements.

I don't think it's possible for the AM to start using the additional allocation 
till the NM has updated all it's state - including writing out recovery 
information for work preserving restart (Thanks Vinod for pointing this out). 
Seems like that poll/callback will be required - unless the plan is to route 
this information via the RM.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588679#comment-14588679
 ] 

MENG DING commented on YARN-1197:
-

[~leftnoteasy], if I understand it correctly, in the {{AllocateResponseProto}}, 
we will have something like {{containers_change_approved}} and 
{{containers_change_completed}}. The former will be filled with ID/capability 
of containers whose change requests have been approved by RM. The latter will 
be filled with ID/capability of containers whose resource change action have 
been completed in NM. Right? 

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588671#comment-14588671
 ] 

MENG DING commented on YARN-1197:
-

[~sandyr], by processing both resource decrease and increase request through 
RM, the original problem that I brought up should not be an issue any more. 
What we are trying to grasp right now is if it is really necessary for the 
increase action to go through RM-AM-NM. IMHO, if we can eliminate the need 
for that while still achieving reasonable performance, that would be ideal.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589001#comment-14589001
 ] 

MENG DING commented on YARN-1197:
-

Sorry got things messed up. 

Correction:

We definitely need {{AllocateResponseProto}} for container increase token. For 
decrease result, it is optional, but probably it doesn't hurt to set it anyway.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588963#comment-14588963
 ] 

MENG DING commented on YARN-1197:
-

[~leftnoteasy], I am certainly OK doing (a). My original frustration was mainly 
about inconsistency in RM when doing decrease through NM, now that we have all 
agreed that decrease should go through RM, the problem is gone.

So here is the latest proposal:

* Container resource decrease:
AM - RM - NM
* Container resource increase:
AM - RM - AM(token) - NM. AM needs to poll status of container before using 
the additional allocation.
Of course we need to properly handle token expiration (i.e., NM - RM 
communication is needed to unregister the container from the expirer).

In addition, I do *not* see a need for any response to be set in the 
{{AllocateResponseProto}}:
* For resource decrease, we can assume it is always successful. 
* For resource increase, we are now doing polling to see if the increase is 
successful.

Let me know if this makes sense.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588965#comment-14588965
 ] 

Vinod Kumar Vavilapalli commented on YARN-1197:
---

bq. I don't think it's possible for the AM to start using the additional 
allocation till the NM has updated all it's state - including writing out 
recovery information for work preserving restart (Thanks Vinod for pointing 
this out). Seems like that poll/callback will be required - unless the plan is 
to route this information via the RM.
We could just use the existing getContainerStatus() API for doing this polling 
for now.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589055#comment-14589055
 ] 

Wangda Tan commented on YARN-1197:
--

[~mding],
The latest proposal 
(https://issues.apache.org/jira/browse/YARN-1197?focusedCommentId=14588963page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14588963)
 makes sense to me. 

bq. We definitely need AllocateResponseProto for container increase token. For 
decrease result, it is optional, but probably it doesn't hurt to set it anyway.
I suggest we only include token when it's necessary, we can add token to 
decrease result when we needed.

bq. We could just use the existing getContainerStatus() API for doing this 
polling for now.
+1, we don't need a new API.

[~sandyr], do you agree with the latest proposal?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586139#comment-14586139
 ] 

MENG DING commented on YARN-1197:
-

Had a very good discussion with [~leftnoteasy] at the Hadoop summit. We all 
agreed that due to the complexity of the current design, it is worthwhile to 
revisit the idea of increasing and decreasing container size both through 
Resource Manager, that would at least eliminate the need for token expiration 
logic, and also eliminate the need for AM-NM protocol and APIs. I am currently 
working on the new design, and will post it for review when it is ready.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586249#comment-14586249
 ] 

Sandy Ryza commented on YARN-1197:
--

Sorry, I've been quiet here for a while, but I'd be concerned about a design 
that requires going through the ResourceManager for decreases.  If I understand 
correctly, this would be considerable hit to performance, which could be 
prohibitive for frameworks like Spark that might use container-resizing for 
allocating per-task resources.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586455#comment-14586455
 ] 

Vinod Kumar Vavilapalli commented on YARN-1197:
---

bq. We all agreed that due to the complexity of the current design, it is 
worthwhile to revisit the idea of increasing and decreasing container size both 
through Resource Manager
+1 for this idea. Letting this go through NodeManager directly adds too much 
complexity and difficult to understand semantics for the application writers.

bq.  If I understand correctly, this would be considerable hit to performance
[~sandyr], as I understand, going through NM is in fact a worse solution w.r.t 
allocation throughput. Going through RM directly is better as the RM will 
immediately know that the resource is available for future allocations - the 
decrease on the NM can happen offline. The control flow I expect is
 - the framework/app decides it doesn't need that many resources anymore. By 
this time, the container already should have given up on the physical resources 
it doesn't need
 - informs the RM about the required decrement
 - RM informs NM to resize the container (cgroups etc)


 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586281#comment-14586281
 ] 

Wangda Tan commented on YARN-1197:
--

[~sandyr],
Thanks for coming back :).
I'm not very sure about what's the performance issue you mentioned if decreases 
goes to RM, what's the expected (ideal) delay in your mind of Sparking 
releasing resource.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586286#comment-14586286
 ] 

Wangda Tan commented on YARN-1197:
--

Sparking-Spark

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586285#comment-14586285
 ] 

Wangda Tan commented on YARN-1197:
--

Sparking-Spark

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586999#comment-14586999
 ] 

MENG DING commented on YARN-1197:
-

[~sandyr], Yes. The key assumption is that by the time the Application Master 
requests resource decrease from RM for a particular container, that container 
should have already reduced its resource usage. Therefore, RM can immediately 
allocate resource to others. 

So to summarize the main idea:
* Both container resource increase and decrease requests go through RM. This 
eliminates the race condition where while a container increase is in progress, 
a decrease for the same container takes place.
* There is no need for AM-NM protocol anymore. This greatly simplifies the 
logic for application writers.
* Resource decrease can happen immediately in RM, and the actual 
enforce/monitor of the decrease can happen offline, as mentioned by Vinod.
* Resource increase, on the other hand, needs more thoughts. 
** In the current design, the RM gives out an increase token to be used by AM 
to initiate the increase on NM. There is no need for this. RM can notify the 
increase to NM through RM-NM heartbeat response.
** RM still needs to wait for an acknowledgement from NM to confirm that the 
increase is done before sending out response to AM. This will take two 
heartbeat cycles, but this is not much worse than giving out a token to AM 
first, and then letting AM initiating the increase.
** Since RM needs to wait for acknowledgement from NM to confirm the increase, 
we must handle such cases as timeout, NM restart/recovery, etc. So we probably 
still need to have a container increase token, and token expiration logic for 
this purpose, but the token will be sent to NM through RM-NM heartbeat 
protocol. (I am still working out the details)



 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586687#comment-14586687
 ] 

Sandy Ryza commented on YARN-1197:
--

bq. Going through RM directly is better as the RM will immediately know that 
the resource is available for future allocations
Is the idea that the RM would make allocations using the space before receiving 
acknowledgement from the NodeManager that it has resized the container 
(adjusted cgroups)? 

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587098#comment-14587098
 ] 

Wangda Tan commented on YARN-1197:
--

[~sandyr],
I think increasing via AM-NM and RM-NM are in very similar range of delay. 
(multi-seconds for now)

a. AM-NM needs 3 stages
1) AM Get increase token from RM
2) AM send increase token to NM
3) Pooling NM about increase status (because we cannot assume increasing can be 
done in NM side very fast)

b. RM-NM needs 4 stages
1) RM send back increasing token to NM
2) NM doing increase locally
3) NM report back to RM when increasing done
4) RM send increase done to AM

Solution b. has an additional RM-NM heartbeat interval

Benefits of b. (Some of them also mentioned by Meng)
- Simpler to AM, only need to know about increase done, don't need to receive 
token and submit/pool NM.
- Create a consistency way for application to increase/decrease containers
- Recovery is simpler, AM only knows increase when its finished, only need to 
handle 2 component recovery (NM/RM) instead of 3 components (NM/RM/AM)

Before we have a fast scheduling design/plan (I don't think we can support 
milli-seconds scheduling for now, too frequent AM heartbeating will overload 
RM), I don't think add an additional NM-RM heartbeat interval is a big problem.


 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587127#comment-14587127
 ] 

Sandy Ryza commented on YARN-1197:
--

Option (a) can occur in the low hundreds of milliseconds if the cluster is 
tuned properly, independent of cluster size.
1) Submit increase request to RM.  Poll RM 100 milliseconds later after 
continuous scheduling thread has run in order to pick up the increase token.
2) Send increase token to NM.

Why does the AM need to poll the NM about increase status before taking action? 
 Does the NM need to do anything other than update its tracking of the 
resources allotted to the container?

Also, it's not unlikely that schedulers will be improved to return the increase 
token on the same heartbeat that it's requested.  So this could all happen in 2 
RPCs + a scheduler decision, and no additional wait time.  Anything more than 
this is probably prohibitively expensive for a framework like Spark to submit 
an increase request before running each task.

Would option (b) ever be able to achieve this kind of latency?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587146#comment-14587146
 ] 

Wangda Tan commented on YARN-1197:
--

[~sandyr],
Thanks for replying,

bq. Why does the AM need to poll the NM about increase status before taking 
action? Does the NM need to do anything other than update its tracking of the 
resources allotted to the container?
Yes, NM only needs to update tracking of the resource and cgroups. We cannot 
assume this can happen immediately, so we cannot put container increased to 
the same RPC. This is same as startContainer, even if launching a container is 
fast in most cases, AM needs to poll NM after invoked startContainer.

bq. Would option (b) ever be able to achieve this kind of latency?
If you consider all now/future optimizations, such as continous-scheduling / 
scheduler make decision at same AM-RM heart-beat. (b) needs one more NM-RM 
heart-beat interval. I agree with you, it could be hundreds of milli-seconds 
(a) vs. multi-seconds (b). when the cluster is idle.

But I'm wondering do we really need add these complexity to AM before we have 
mature optimizatons listed above? And also, if the cluster is busier, we cannot 
expect the delay as well. I tend to do (b) now since it's simpler to app 
developer to use this feature, I'm open to add AM-NM channel if we have YARN 
scheduler supports fast scheduling better.


 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587063#comment-14587063
 ] 

Vinod Kumar Vavilapalli commented on YARN-1197:
---

The details looks good.

Let's make sure we handle RM, AM and NM restarts correctly. Also, let's design 
the RM - NM protocol to be generic and common enough for regular launch/stop 
and increase/decrease.

Tx again for driving this!

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587072#comment-14587072
 ] 

Sandy Ryza commented on YARN-1197:
--

bq. RM still needs to wait for an acknowledgement from NM to confirm that the 
increase is done before sending out response to AM. This will take two 
heartbeat cycles, but this is not much worse than giving out a token to AM 
first, and then letting AM initiating the increase.

I would argue that waiting for an NM-RM heartbeat is much worse than waiting 
for an AM-RM heartbeat.  With continuous scheduling, the RM can make decisions 
in millisecond time, and the AM can regulate its heartbeats according to the 
application's needs to get fast responses.  If an NM-RM heartbeat is involved, 
the application is at the mercy of the cluster settings, which should be in the 
multi-second range for large clusters.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587067#comment-14587067
 ] 

Sandy Ryza commented on YARN-1197:
--

Is my understanding correct that the broader plan is to move stopping 
containers out of the AM-NM protocol? 

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587168#comment-14587168
 ] 

Sandy Ryza commented on YARN-1197:
--

bq. If you consider all now/future optimizations, such as continous-scheduling 
/ scheduler make decision at same AM-RM heart-beat. (b) needs one more NM-RM 
heart-beat interval. I agree with you, it could be hundreds of milli-seconds 
(a) vs. multi-seconds (b). when the cluster is idle.

To clarify: with proper tuning, we can currently get low hundreds of 
milliseconds without adding any new scheduler features.  With the new scheduler 
feature I'm imagining, we'd only be limited by the RPC + scheduler time, so we 
could get 10s of milliseconds with proper tuning.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587174#comment-14587174
 ] 

Sandy Ryza commented on YARN-1197:
--

Regarding complexity in the AM, the NMClient utility so far has been an API 
that's fairly easy for app developers to interact with.  I've used it more than 
once and had no issues.  Would we not be able to handle most of the additional 
complexity behind it?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587196#comment-14587196
 ] 

Wangda Tan commented on YARN-1197:
--

bq. To clarify: with proper tuning, we can currently get low hundreds of 
milliseconds without adding any new scheduler features. With the new scheduler 
feature I'm imagining, we'd only be limited by the RPC + scheduler time, so we 
could get 10s of milliseconds with proper tuning.
I think this assumes cluster is quite idle, I understand the low latency could 
be achieved, but it's not guaranteed since we don't support oversubscribing, 
etc. If you assume the cluster is very idle, one solution might be holding more 
resource at the beginning instead of increasing. In real environment, I think 
the expectation of delay should still be seconds level.

From YARN's perspective, (b) handles most of logic within YARN daemons 
(instead of AM), we don't need to consider inconsistency status between RM/AM 
when doing recovery, that is really what I prefer :). I'm not against of doing 
(a), but I prefer to do that when we have solid foundation for fast 
scheduling. I'm not sure if there's any resource management platform in 
production supports that, but some research papers such as Sparrow uses quite 
different protocol/approach than YARN. I expect there're still some TODO items 
for YARN to get guaranteed fast scheduling.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576495#comment-14576495
 ] 

Wangda Tan commented on YARN-1197:
--

[~mding], just added you to contributor list, you can go ahead and assign JIRAs 
to you.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-05 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575519#comment-14575519
 ] 

MENG DING commented on YARN-1197:
-

Just an update, I am currently working on:

YARN-1449, API in NM side to support change container resource
YARN-1643, ContainerMonitor changes in NM
YARN-1510, NMClient

I will append patches and drive discussions in each ticket.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-28 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563052#comment-14563052
 ] 

MENG DING commented on YARN-1197:
-

Just wanted to add that if dominant resource calculator is being used, it may 
compare different dimensions between target and current resource, but since we 
have the restriction that all dimensions must be = or = for increase/decrease 
actions, there should be no conflicting results. 

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-27 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561134#comment-14561134
 ] 

MENG DING commented on YARN-1197:
-

Correct a typo in my previous post, it should be:
bq. As an example, if a container is currently using 2G, and AM asks to 
increase its resource to 4G, and then asks again to increase to 6G, but AM 
doesn't actually use any of the token to increase the resource on NM. In this 
case, with the current design, RM can only revert the resource allocation back 
to 4G after expiration, not 2G.

Forgot to discuss another important piece. We probably should not use the 
existing ResourceCalculator to compare two resource capabilities in this 
project, because:
- The DefaultResourceCalculator only compares memory, which won't work if we 
want to only change CPU cores.
- The DominantResourceCalculator may end up comparing different dimensions 
between two Resources, which doesn't make sense in our project.

The way to compare two resource in this project should be straightforward as 
follows. Let me know if you think otherwise.
- For increase request, no dimension in the target resource can be smaller than 
the corresponding dimension in the current resource, and at least one dimension 
in the target resource must be larger than the corresponding dimension in the 
current resource.
- For decrease request, no dimension in the target resource can be larger than 
the corresponding dimension in the current resource, and at least one dimension 
in the target resource must be smaller than the corresponding dimension in the 
current resource. 



 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-27 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561098#comment-14561098
 ] 

MENG DING commented on YARN-1197:
-

Thanks [~vinodkv] and [~leftnoteasy] for the great comments!

*To [~vinodkv]:*

bq.  Expanding containers at ACQUIRED state sounds useful in theory. But agree 
with you that we can punt it for later.
Thanks for the confirmation :-)

bq. To your example of concurrent increase/decrease sizing requests from AM, 
shall we simply say that only one change-in-progress is allowed for any given 
container?
Actually we really wanted to be able to achieve this, but with the current 
asymmetric logic of increasing resource from RM, and decreasing resource from 
NM, it doesn't seem to be possible :-( The reason is because:
* The increase action starts from AM requesting the increase from RM, being 
granted a resource increase token, then initiating the increase action on NM, 
until finally NM confirming with RM about the increase.
* Once an increase token has been granted to AM, and before it expires (10 
minutes by default), if AM does not initiate the increase action on NM, *NM 
will have no idea that an increase is already in progress*.
* If, at this moment, AM initiates a resource decrease action on NM, NM will go 
ahead and honor it. So in effect, there can be concurrent decrease/increase 
action going on, and there doesn't seem to be a way to block this.

bq. If we do the above, this will also simplify most of the code, as we will 
simply have the notion of a Change, instead of an explicit increase/decrease 
everywhere. For e.g., we will just have a ContainerResourceChangeExpirer.
I believe the ContainerResourceChangeExpirer only applies to the container 
resource increase action. The container decrease action goes directly through 
NM so it does not need an expiration logic.

bq. There will be races with container-states toggling from RUNNING to finished 
states, depending on when AM requests a size-change and when NMs report that a 
container finished. We can simply say that the state at the ResourceManager 
wins.
Agreed.

bq. Didn't understand why we need this RM-NM confirmation. The token from RM to 
AM to NM should be enough for NM to update its view, right?
This is the same as the reasons listed above.

bq. Instead of adding new records for ContainerResourceIncrease / decrease in 
AllocationResponse, should we add a new field in the API record itself stating 
if it is a New/Increased/Decreased container? If we move to a single change 
model, it's likely we will not even need this.

I am open to this suggestion. We could add a field in the existing 
*ContainerProto* to indicate if this Container is new/increased/decreased 
container. The only thing I am not sure is if we can still change the 
AllocateResponseProto now that the ContainerResourceIncrease/Decrease is 
already in the trunk?

bq. Any obviously invalid change-requests should be rejected right-away. For 
e.g, an increase to more than cluster's max container size. Seemed like you are 
suggesting we ignore the invalid requests.

Agreed that any invalid increase requests from AM to RM, and invalid decrease 
requests from AM to NM should be directly rejected. The 'ignore' case I was 
referring to is in the context of NodeUpdate from NM to RM.

bq. Nit: In the design doc, the high-level flow for container-increase point #7 
incorrectly talks about decrease instead of increase.

Yes, this is a mistake, and I will correct it.

bq. I propose we do this in a branch

Definitely. There is already a YARN-1197 branch, and we can simply work in that 
branch.

*To [~leftnoteasy]:*

bq. Actually the appoarch in design doc is this (Meng plz let me know if I 
misunderstood). In scheduler's implementation, it allows only one pending 
change request for same container, later change-request will either overwrite 
prior one or rejected.
The current design only allows one increase request in the whole system, which 
is guaranteed by the ContainerResourceIncreaseExpirer object. However, as 
explained above, we cannot block decrease action while an increase action is 
still in progress.

bq. 1) For the protocols between servers/AMs, mostly same to previous doc, the 
biggest change I can see is the ContainerResourceChangeProto in 
NodeHeartbeatResponseProto, which makes sense to me.

Yes, the ContainerResourceChangeProto is the biggest change. Glad that you 
agree with this new protocol :-)

bq. 2) For the client side change: 2.2.1, +1 to option 3.

Great. I will remove option 1 and option 2 from the design doc.

bq. 3) For 2.3.3.2 scheduling part, {{The scheduling of an outstanding resource 
increase request to a container will be skipped if there are either:}}. Both of 
the two may not needed since AM can require for more resource when container 
increase (e.g. container increased to 4G, and AM wants it to be 6G before 
notify NM).


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561573#comment-14561573
 ] 

Wangda Tan commented on YARN-1197:
--

[~mding].
For the comparison of resources, I think for both increase/decrease, it should 
be = or = for all dimensions. But if resource calculator is default, increase 
v-core makes no sense. So I think ResourceCalculator has to be used, but also 
needs to check all individual dimensions.

So the logic will be:
{code}
if (increase): 
   delta = target - now
   if delta.mem  0 || delta.vcore  0:
  throw exception
   if resourceCalculator.lessOrEqualThan(delta, 0):
  throw exception
   // .. move forward
{code}

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-27 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561710#comment-14561710
 ] 

MENG DING commented on YARN-1197:
-

[~leftnoteasy]
Makes sense to me. Will update the doc to include this.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-27 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561722#comment-14561722
 ] 

MENG DING commented on YARN-1197:
-

[~leftnoteasy]
Makes sense to me. Will update the doc to include this.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559869#comment-14559869
 ] 

Wangda Tan commented on YARN-1197:
--

Thanks for [~mding] thinking and extending to the thorough design doc and 
review from [~vinodkv]. I would really like to see this can be moved forward.

To Vinod's comment:
bq. Didn't understand why we need this RM-NM confirmation. The token from RM to 
AM to NM should be enough for NM to update its view, right?
This is to make sure RM/NM are synchronized, one example is mentioned in 
https://issues.apache.org/jira/browse/YARN-1197?focusedCommentId=14559284page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14559284.
 In this design, NM/RM are two-way communicate, so RM need acknowledage to NM 
about changes so that NM can change container monitoring status locally to 
avoid inconsistency happens.

bq. To your example of concurrent increase/decrease sizing requests from AM, 
shall we simply say that only one change-in-progress is allowed for any given 
container?
Actually the appoarch in design doc is this (Meng plz let me know if I 
misunderstood). In scheduler's implementation, it allows only one pending 
change request for same container, later change-request will either overwrite 
prior one or rejected. 

Some feedbacks to the design doc so far:
1) For the protocols between servers/AMs, mostly same to previous doc, the 
biggest change I can see is the {{ContainerResourceChangeProto}} in 
{{NodeHeartbeatResponseProto}}, which makes sense to me.
2) For the client side change: 2.2.1, +1 to option 3.
3) For 2.3.3.2 scheduling part, {{The scheduling of an outstanding resource 
increase request to a container will be skipped if there are
either:}}. Both of the two may not needed since AM can require for more 
resource when container increase (e.g. container increased to 4G, and AM wants 
it to be 6G before notify NM).
4) We may not need reserved increase request, all increase request should be 
considered to be reserved. But we still need to respect orders of 
applications in LeafQueue, no matter it's original FIFO or Fair (added after 
YARN-3306). We can discuss more scheduling details in separated JIRA.

I will clean up subtasks (some of them are too detailed to me, especially for 
scheduler internal changes). Will post once I finished.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559928#comment-14559928
 ] 

Wangda Tan commented on YARN-1197:
--

Hi [~mding],
I just completed clean up of sub JIRAs. I think some of them are too detailed. 
For example, it will be very hard to split works in 
FiCaSchedulerNode/FiCaSchedulerApp with changes of LeafQueue/ParentQueue. 
Following are JIRAs after cleanup:

*API:*
YARN-1449, API in NM side to support change container resource.
YARN-1502, API changes in RM side to support change contaienr resource.

*Client:*
YARN-1509, AMRMClient
YARN-1510, NMClient

*NM implementation*
YARN-1643, ContainerMonitor changes in NM

*RM implementation*
YARN-1646, Support change container resource in RM.
YARN-1651, CapacityScheduler side changes.

I unassigned myself from many of these JIRAs, but I still plan to implement 
changes in RM/CS side. For other JIRAs, please feel free to take.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559948#comment-14559948
 ] 

Wangda Tan commented on YARN-1197:
--

Short summary about past works to make sure they will not be wasted.
- https://issues.apache.org/jira/secure/attachment/12618222/yarn-1449.5.patch 
contains changes of YARN-1449 and YARN-1643. They are very likely can be reused.
- https://issues.apache.org/jira/secure/attachment/12619072/yarn-1502.2.patch 
contains changes of YARN-1646 and YARN-1651. They are very likely can NOT be 
reused. :(

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559457#comment-14559457
 ] 

Vinod Kumar Vavilapalli commented on YARN-1197:
---

Tx for taking this up [~mding]!

Read your updated doc. Looks good overall. Pretty comprehensive, great work!

Some comments

h4. Major
 - Expanding containers at ACQUIRED state sounds useful in theory. But agree 
with you that we can punt it for later.
 - To your example of concurrent increase/decrease sizing requests from AM, 
shall we simply say that only one change-in-progress is allowed for any given 
container?
 - If we do the above, this will also simplify most of the code, as we will 
simply have the notion of a _Change_, instead of an explicit increase/decrease 
everywhere. For e.g., we will just have a ContainerResourceChangeExpirer.
 - There will be races with container-states toggling from RUNNING to finished 
states, depending on when AM requests a size-change and when NMs report that a 
container finished. We can simply say that the state at the ResourceManager 
wins. 
 - bq. After processing all resource change messages for a container in a node 
update, RM will set the current resource allocation known by RM for this 
container in the next node heartbeat response, so that NM will  (eventually) 
have the same view of the resource allocation of this container with RM, and 
monitor/enforce accordingly.
Didn't understand why we need this RM-NM confirmation. The token from RM to AM 
to NM should be enough for NM to update its view, right?

h4. Minor
 - Instead of adding new records for ContainerResourceIncrease / decrease in 
AllocationResponse, should we add a new field in the API record itself stating 
if it is a New/Increased/Decreased container? If we move to a single change 
model, it's likely we will not even need this.
 - Any obviously invalid change-requests should be rejected right-away. For 
e.g, an increase to more than cluster's max container size. Seemed like you are 
suggesting we ignore the invalid requests.
 - Nit: In the design doc, the high-level flow for container-increase point #7 
incorrectly talks about decrease instead of increase.

Just caught up with the rest of your discussion w.r.t decreasing 
container-sizes. the feature is useful outside of JVM processes - C code, 
servers managing their data off-heap etc, so we can continue working on it.

h4. Process
I propose we do this in a branch. We got in a couple of patches earlier from 
[~leftnoteasy] and then the feature unfortunately dropped on the floor. Branch 
helps avoid this going forward.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-22 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556868#comment-14556868
 ] 

MENG DING commented on YARN-1197:
-

So to summarize the current dilemma:

Situation:
- A container resource increase request has been granted, and a token has been 
issued to AM, and
- The increase action has not been fulfilled, and the token is not expired yet

Problem:
- AM can initiate a container resource decrease action to NM, and NM will 
fulfill it and notify RM, and then
- Before the toke expires, AM can still initiate a container resource increase 
action to NM with the token, and NM will fulfill it and notify RM

Proposed solution:
- When RM receives a container decrease message from NM, it will first check if 
there is an outstanding container increase action (by checking the 
ContainerResourceIncreaseExpirer)
- If the answer is no, RM will go ahead and update its internal resource 
bookkeeping and reduce the container resource allocation for this container.
- If the answer is yes, RM will skip the resource reduction in this cycle, keep 
the resource decrease message in its newlyDecreasedContainers data structure, 
and check again in the next NM-RM heartbeat cycle.
- If in the next heartbeat, a resource increase message to the same container 
comes, the previous resource decrease message will be dropped.

Not sure if there are better solution to this problem. Let me know if this 
makes sense or not.

Thanks,
Meng


 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-22 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556803#comment-14556803
 ] 

MENG DING commented on YARN-1197:
-

Well, I think I spoke too soon :-)

The example I gave above is not entirely correct:

1. A container is currently using 6G
2. AM asks RM to increase it to 8G
3. RM grants the increase request, allocates the resource to the container to 
8G, and issues a token to AM. It starts a timer and remembers the original 
resource allocation before the increase as 6G.
4. AM, instead of initiating the resource increase to NM, requests a resource 
decrease to NM to decrease it to 4G
5. The decrease is successful and RM gets the notification, and updates the 
container resource to 4G
6. Before the token expires, the AM requests the resource increase to NM
7 The increase is successful and RM gets the notification, and updates the 
container resource back to 8G

Step 6 and 7 should not be allowed because the RM has already reduced the 
container resource to 4G, which effectively invalidated the previous granted 
increase request (8G), even though the token has not yet expired. 




 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-22 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556383#comment-14556383
 ] 

MENG DING commented on YARN-1197:
-

[~leftnoteasy] I totally agree that Yarn should not mess with Java Xmx. Sorry 
for not being clear before.

While digging into the design details of this issue, there is (I think) a piece 
that is missing from the original design doc, which I hope to get some 
insights/clarifications from the community:

It seems there is no discussion about the container resource increase token 
expiration logic.

Here is what I think that should happen:

1. AM sends a container resource increase request to RM.
2. RM grants the request, allocating the additional resource to the container, 
updating its internal resource bookkeeping.
3. During the next AM-RM heartbeat, RM pulls the newly increased container, 
creates a token for it and sets the token in the allocation response. In the 
meantime, RM starts a timer for this granted increase (e.g., register with 
ContainerResourceIncreaseExpirer).
4. AM acquires the container resource increase token from the heartbeat 
response, then calls the NMClient API to launch the container resource increase 
action on NM.
5. NM receives the request, increases the monitoring quota of the container, 
and notifies the NodeStatusUpdater.
6. The NodeStatusUpdater informs the increase success to RM during regular 
NM-RM heartbeat.
7. Upon receiving the increase success message, the RM stops the timer (e.g, 
unregister with  ContainerResourceIncreaseExpirer).

If, however, the timer in RM expires, and no increase success message is 
received for this container, *RM must release the increased resource to the 
container, and update its internal resource bookkeeing*.

As such, NM-RM heartbeat must also include container resource increase message 
(which doesn't exist in the original design), otherwise the expiration logic 
will not work. 

In addition, RM must remember the original resource allocation to the container 
(this info may be stored in the ContainerResourceIncreaseExpirer), because in 
the case of expiration, RM needs to release the increased resource and revert 
back to the original resource allocation. This is different from a newly 
allocated container, in which case, RM simply needs to release the resource for 
the entire container when it expires.

To make matters more complicated, after a container resource increase token has 
been given out, and before it expires, there is no guarantee that AM won't 
issue a resource *decrease* action on the same container. Because the resource 
decrease action starts from NM, NM has no idea that a resource increase token 
on the same container has been issued, and that a resource increase action 
could happen anytime.

Given the above, here is what I propose to simplify things as much as we can 
without compromising the main functionality:

*At the RM side* 
1. During each scheduling, if RM finds that there are still granted container 
resource increase sitting in RM (i.e., not yet acquired by AM), it will skip 
scheduling any outstanding resource increase request to the same container.
2. During each scheduling, if RM finds that there is a granted container 
resource increase registered with ContainerResourceIncreaseExpirer, it will 
skip scheduling any outstanding resource increase request to the same container.

This will guarantee that at any time, there can be one and only one resource 
increase request for a container.

*At the NM side*
1. Create a map to track any resource increase or decrease action for a 
container in NMContext. At any time, there can only be either an increase 
action or a decrease action going on for a specific container. While an 
increase/decrease action is in progress in NM, any new request from AM to 
increase/decrease resource to the same container will be rejected (with proper 
error messages).

With the above logic, here is an example of what could happen:

1. A container is currently using 6G
2. AM asks RM to increase it to 8G
3. RM grants the increase request, allocates the resource to the container to 
8G, and issues a token to AM. It starts a timer and remembers the original 
resource allocation before the increase as 6G.
4. AM, instead of initiating the resource increase to NM, requests a resource 
decrease to NM to decrease it to 4G
5. The decrease is successful and RM gets the notification, and updates the 
container resource to 4G

After this, two possible sequences may occur:

6. Before the token expires, the AM requests the resource increase to NM
7 The increase is successful and RM gets the notification, and updates the 
container resource back to 8G

Or,

6. AM never sends the resource increase to NM
7. The token expires in RM. RM attempts to revert the resource increase (i.e., 
set the resource allocation back to 6G), but seeing that it is currently using 

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542771#comment-14542771
 ] 

Wangda Tan commented on YARN-1197:
--

I agree to what [~kasha] mentioned, increasing Xmx doesn't like a very good 
idea, I think we should treat JVM as a blackbox and not trying to hack it from 
Yarn's perspective. It's fine if user's application do the Xmx stuffs to make 
it shrinkable.

The reason why only support CPU enforcement is CPU enforcement in LCE is a soft 
limit, and memory is a hard limit which can lead to process failure when memory 
spike happens, you can check YARN-3 discussion for more details. YARN-2793 is 
different, it tries to define the behavior of killing an over-used container 
not how to enforce it.

Dynamic update cgroup is also not supported, but I think we should support it 
with this ticket.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540794#comment-14540794
 ] 

Wangda Tan commented on YARN-1197:
--

[~mding],
Thanks for interesting in this ticket, some comments:
bq. For JVM based containers (e.g., container running HBase), it is not 
possible right now to change the heap size of JVM without restarting the Java 
process. Even if we can implement a wrapper in the container to relaunch a Java 
process when resource is changed for a container, we still need to implement an 
interface between node manager and container to trigger the relaunch action.
Good point, this is one thing we noted as well. I don't think there's any easy 
solution to shrink JVM. Relaunch the container could be one method, but it will 
be hard to make a generic container wrapper since kill and relaunch will make 
data in memory lost.

But since the shrink memory is a proactive action, when a process wants to 
shrink its resource, it can use its own container wrapper to relaunch the 
process if it has some data recovery mechanism.


 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541154#comment-14541154
 ] 

Karthik Kambatla commented on YARN-1197:


bq. We thought about launching the JVM based container with -Xmx set to the 
physical memory of the node, and use cgroup memory control to enforce the 
resource limit, but we don't think LCE supports memory isolation right now . We 
cannot use YARN's default memory enforcement as we don't want long running 
services to be killed.

A JVM with a larger value for Xmx will *likely* be less aggressive with GC. Any 
resultant increase in heap size might or might not be a good thing. If you 
think this is something viable that people care about, we could consider adding 
a memory-enforcement option to LCE. 

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-12 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541233#comment-14541233
 ] 

MENG DING commented on YARN-1197:
-

Thanks guys for the comments.

Yes, I believe memory enforcement option to LCE is definitely a desirable 
feature and the proper way to handle memory enforcement for long running 
services. Looks like YARN-2793 is related, and YARN-3 already had a patch for 
this? 

Then we also need the capability to dynamically update cgroup that a process is 
run under, which I believe is not supported today either, right? 

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-05-12 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539983#comment-14539983
 ] 

MENG DING commented on YARN-1197:
-

We have a real use case to better support long running services on YARN, and to 
share resources between long running services and batch jobs. We have carefully 
reviewed discussions and documentations in this thread (and other topics 
related to this thread), and are committed to bring this work to completion. We 
agree with the general design of this feature, and understand that this is the 
result of an extensive discussions among many experts. 

We will attempt to post an updated design shortly for review.

We don't really see a bottleneck at the scheduler side at this moment. However, 
we do see problems with memory enforcement for long running services.
- For JVM based containers (e.g., container running HBase), it is not possible 
right now to change the heap size of JVM without restarting the Java process. 
Even if we can implement a wrapper in the container to relaunch a Java process 
when resource is changed for a container, we still need to implement an 
interface between node manager and container to trigger the relaunch action.
- We thought about launching the JVM based container with -Xmx set to the 
physical memory of the node, and use cgroup memory control to enforce the 
resource limit, but we don't think LCE supports memory isolation right now (?). 
We cannot use YARN's default memory enforcement as we don't want long running 
services to be killed.

So overall there doesn't seem to be an easy solution for memory enforcement 
without killing the long running services right now. Any comments or 
suggestions will be greatly appreciated.

Thanks,
Meng

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2014-12-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240836#comment-14240836
 ] 

Tsuyoshi OZAWA commented on YARN-1197:
--

[~leftnoteasy], could you review your work and unassign some of your tasks 
unless you don't start to work? This feature is useful for some YARN 
applications(e.g. Spark and Tez).

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Jeff Zhang
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2014-12-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241352#comment-14241352
 ] 

Wangda Tan commented on YARN-1197:
--

I'm glad if anybody can continue this work.

Since potentially it needs huge effort to get completed, to get this continue, 
I think we need to:
- High level discussion: Since the design doc, patches and tasks were created 
about 1 years ago, some of them need rethink/amendment and some of them were 
totally stale. For example, is there any *actual* use cases of this JIRA? Are 
there any downstream projects plan to consume these APIs? Are there any 
alternative ticket can do the similar proposal (Like YARN-1488)?. 
- Implementation: After high level discussion, I think we can think about plan 
to implement it, there're bunch of sub tasks I created for this ticket. If 
you're interested in any of them, just let me know and I can assign that to 
you. If you don't agree with the task coverage/granularity, please feel free to 
create a new ticket and I can close the original one.

And I think the umbrella JIRA should leave empty, [~zjffdu] if you're not 
working on it, could you unassign it?

Thoughts?

Thanks,
Wangda

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Jeff Zhang
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2014-12-09 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240281#comment-14240281
 ] 

Chen He commented on YARN-1197:
---

I can also contribute some time on this JIRA.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Jeff Zhang
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2014-04-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961292#comment-13961292
 ] 

Karthik Kambatla commented on YARN-1197:


I am interested in working this. Depending on the progress, I ll be glad to 
write patches or review them. 

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Jeff Zhang
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2014-03-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934673#comment-13934673
 ] 

Wangda Tan commented on YARN-1197:
--

I'm leaving my current company on next week, and am no longer involved in 
YARN-1197, one of my colleagues will take this Jira and sub tasks.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2014-01-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882232#comment-13882232
 ] 

Wangda Tan commented on YARN-1197:
--

I created a bunch of sub tasks for easier review, 
NM side changes are: YARN-1643, YARN-1644, YARN-1645, instead of big task -- 
YARN-1449; I'll work on them first.
RM side changes are: YARN-1646, YARN-1647, YARN-1648, YARN-1649, YARN-1650, 
YARN-1651, YARN-1652, YARN-1653, YARN-1654, instead of big task -- YARN-1502. 
These sub tasks will add change container resource support to capacity 
scheduler and change corresponding implementations in RM side; I'll break-down 
existing patch in YARN-1502 and submit after changes in NM completed.
YARN-1655 will try to add support to fair scheduler, I plan to work on this 
after changes in capacity scheduler completed.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2014-01-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882545#comment-13882545
 ] 

Wangda Tan commented on YARN-1197:
--

I attached patches for NM side changes, YARN-1643, YARN-1644, YARN-1645. Can 
someone give it a review? Thanks!
[~bikassaha] [~sandyr] [~vinodkv]

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2014-01-17 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13875079#comment-13875079
 ] 

Sandy Ryza commented on YARN-1197:
--

Created a YARN-1197 branch.  Will revert the commits in trunk and branch-2 soon.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2014-01-17 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13875509#comment-13875509
 ] 

Wangda Tan commented on YARN-1197:
--

Thanks Sandy, I'm working on breaking down RM patch and will create sub-jiras 
for better review.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2014-01-06 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863469#comment-13863469
 ] 

Sandy Ryza commented on YARN-1197:
--

[~acmurthy], any progress on the branch?  If not, I'd be happy to take care of 
it.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-12-17 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850830#comment-13850830
 ] 

Arun C Murthy commented on YARN-1197:
-

Thanks [~wangda], glad we agree. I'll prepare a branch and move the commits 
there. Thanks again for your contributions and for being so flexible.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-12-17 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851126#comment-13851126
 ] 

Wangda Tan commented on YARN-1197:
--

[~acmurthy], great, thanks :)

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, 
 yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-12-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846389#comment-13846389
 ] 

Wangda Tan commented on YARN-1197:
--

Copy text from scheduler design doc to here for easier discussion, please feel 
free to let me know your comments!

*Basic Requirements*
We need support handling resource increase request from AM and resource 
decrease notify from NM
* Such resource changes should reflect to FiCaSchedulerNode/ App, LeafQueue, 
ParentQueue (like usedResource, reservedResource, etc.)
* If user requested an increase request and not be satisfied immediately, it 
will be reserved in node/app (The node/app means FiCaSchedulerApp/Node, same in 
below) like before.

*Advanced Requirements*
* We need gracefully handle racing conditions,
** Only acquired/running containers can be increased
** Container decreasing will only take effect in acquired/running containers. 
(If a container is finished/killed, etc. All of its resource will be released, 
we don’t need decrease it)
** User may request a new increase requests on a container, and a pending 
increase request for the same container existed. We need replace the pending 
with the new one.
** When a requested container resource is less or equal to existing container 
resource. 
* This will be ignored if no pending increase request for this container
* This will be ignored and the pending increase request will be canceled
** When a pending increase request existed, and a decrease container notify on 
the same container comes, this container will be decreased and the pending 
increase request will be canceled

*Requirements not clear*
* Do we need a time-out parameter for reserved resource increase request to 
avoid it occupy the node resource too long? (Do we have such parameter for 
reserve a “normal” container in CS?)
* How to decide which of increase request and normal container request will be 
satisfied first? (Currently, I simply make CS satisfy increase request first).  
Should it be a configurable parameter?

*Current Implementation*

*1) Decrease Container*
I start with decrease container because it’s more easier to understand,
Decreased container will be handled in nodeUpdate() of Capacity scheduler.
When CS received decreased containers from NM, it will process them one by one 
by following steps

* Check if it’s in running state (Because this is reported by NM, it’s state 
will either be running or completed), skip if no.
* Remove increase request on the same container-id if it exists
* Decrease/Update container resource in 
FiCaSchedulerApp/AppSchedulingInfo/FiCaSchedulerNode/LeafQueue/ParentQueue/other-related-metrics
* Update resource in Container.
* Return decreased container to AM by calling setDecreasedContainer in 
AllocateResponse

*2) Increase Container*
Increasing container will be much more complex than decreasing, 

*Steps to add container increase request, (pseudo code)*
In CapacityScheduler.allocate(...)
{code}
foreach (increase_request):
if (state != ACQUIRED) and (state != RUNNING):
continue;

// Remove the old request on the same container-id if it exists
if increase_request_exist(increase_request.getContainerId()):
remove(increaseRequest);

// Ask target resource should larger than existing resource
if increase_request.ask_resource = 
existing_resource(increase_request.getContainerId()):
continue;

// Add it to application
getApplication(increase_request.getContainerId()).add(increase_request)
{code}

*Steps to handle container increase request,*
2.1) In CapacityScheduler.nodeUpdate(...):
{code}
if node.is_reserved():
if reserved-increase-request:
LeafQueue.assignReservedIncreaseRequest(...)
elif reserved-normal-container:
...
else:
ParentQueue.assignContainers(...)
// this will finally call 
// LeafQueue.assignContainers(...)
{code}

2.2) In CapacityScheduler.nodeUpdate(...):
{code}
if request-is-fit-in-resource:
allocate-resource
update container token
add to AllocateResponse
return allocated-resource
else:
return None
{code}

2.3) In LeafQueue.assignContainers(...):
{code}
foreach (application):
// do increase allocation first
foreach (increase_request):
// check if we can allocate it
// in queue/user limites, etc.
// return None if not satisfied

if request-is-fit-in-resource:
allocate-resource
update container token
add to AllocateResponse
else:
reserve in app/node
return reserved-resource

// do normal allocation
...
{code}

*API changes in CapacityScheduler*
1)YarnScheduler
{code}
   public Allocation 

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-12-11 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845959#comment-13845959
 ] 

Vinod Kumar Vavilapalli commented on YARN-1197:
---

I just caught up with this. Well written document, thanks! Some questions:
 - Decreasing resources:
-- Seems like the control flow is asymetrical for resource decrease. We 
directly go to the node first. Is that intended? On first look, that seems fine 
- decreasing resource usage on a node is akin to killing a container by talking 
to NM directly.
-- In such applications that decrease container-resource, will the 
application first instruct its container to reduce the resource usage and then 
inform the platform? The reason this is important is if it doesn't happen that 
way, node will forcefully either kill it when monitoring resource usage or 
change its cgroup immediately causing the container to swap.

Also, I can see that some of the scheduler changes are going to be pretty 
involved. I'd also vote for a branch. A couple of patches already went in and 
I'm not even sure we already got them right and/or if they need more revisions 
as we start making core changes. To avoid branch-rot, we could target a subset, 
say just the resource-increase changes in the branch and do the remaining work 
on trunk after merge.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: mapreduce-project.patch.ver.1, 
 tools-project.patch.ver.1, yarn-1197-v2.pdf, yarn-1197-v3.pdf, 
 yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
 yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
 yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
 yarn-server-resourcemanager.patch.ver.1


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


  1   2   >