[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1197: - Component/s: graceful > Support changing resources of an allocated container > > > Key: YARN-1197 > URL: https://issues.apache.org/jira/browse/YARN-1197 > Project: Hadoop YARN > Issue Type: Task > Components: api, graceful, nodemanager, resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Wangda Tan > Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, > YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, > YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf > > > The current YARN resource management logic assumes resource allocated to a > container is fixed during the lifetime of it. When users want to change a > resource > of an allocated container the only way is releasing it and allocating a new > container with expected size. > Allowing run-time changing resources of an allocated container will give us > better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-1197: Attachment: YARN-1197_Design.2015.08.21.pdf The NodeManager side of the implementation has been completed. Many thanks to [~jianhe] and [~leftnoteasy] for the review and valuable comments. There have been quite a few design changes along the way (for the better!), and I have updated the design doc accordingly. The main changes include: * Getting rid of the {{IncreasedContainerProto}} and {{DecreasedContainerProto}}, and just use the existing {{ContainerProto}}. * Container resource is now updated synchronously in NM as the whole process is fairly light weighted. This greatly simplifies the logic. The {{ContainerManager#increaseContainersResource}} call is now a blocking call, and a success return of this function means that the increase is completed in NM, and no further container status checking is required by AM. * Container resource increase is synchronized with NM-RM registration to handle the race condition of RM restarts while container resource increase is in progress. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-1197: Attachment: YARN-1197_Design.2015.07.07.pdf The design doc has been updated to include a *Design Choices* section, which summarizes the rationales behind major design decisions. Patches to {{NodeManager}} side of the implementation have been posted for review, which include: YARN-1449: AM-NM protocol changes to support container resizing YARN-1645: ContainerManager implementation to support container resizing YARN-3867: ContainerImpl changes to support container resizing YARN-1643: ContainersMonitor changes to support monitoring/enforcing container resizing YARN-1644: RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing YARN-3868: ContainerManager recovery for container resizing In addition, YARN-3866 has been posted for review to unblock RM/Scheduler changes in YARN-1651 and YARN-1646. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-1197: Attachment: YARN-1197_Design.2015.06.24.pdf The design doc has been updated based on the latest discussion and feedback from all parties. I have added a section for NodeManager recovery as well. Some details still need to be ironed out, which will be covered in sub-tasks. Special thanks to [~leftnoteasy] for all the valuable suggestions offline. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: tools-project.patch.ver.1) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-1197-v2.pdf) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-1197-v3.pdf) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-api-protocol.patch.ver.1) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-server-resourcemanager.patch.ver.1) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-pb-impl.patch.ver.1) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-1197-v5.pdf) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-1197.pdf) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-1197-scheduler-v1.pdf) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-1197-v4.pdf) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, yarn-1197-v5.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: mapreduce-project.patch.ver.1) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: YARN-1197 old-design-docs-patches-for-reference.zip Made old design docs / patches to a zip file to avoid unnecessary noise. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-server-common.patch.ver.1) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: (was: yarn-server-nodemanager.patch.ver.1) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-1197: Attachment: YARN-1197_Design.pdf We have come up with a new proposal to address the above problem, which we believe makes more sense. In essence, when RM receives a valid resource decrease message for a container, it should go ahead and honor it directly. If later on an increase message comes for the same container, and the target resource allocation is different than the current resource allocation known by RM, this increase action should be cancelled. To cancel this increase action, RM can simply use the NM-RM node heartbeat response to send the current resource allocation of this container known by RM back to NM. NM can use this value to monitor and enforce the resource usage of the container. In fact, we propose that after RM processes resource change messages from NM after each node update heartbeat, it should simply set the final resource allocation of those containers that have just been re-sized in the heartbeat response, so that NM will have the same view of the resource allocation of those changed containers as RM. Back to the original example: 1. A container is currently using 6G 2. AM asks RM to increase it to 8G 3. RM grants the increase request, allocates the resource to the container to 8G, and issues a token to AM. It starts a timer and remembers the original resource allocation before the increase as 6G. 4. AM, instead of initiating the resource increase to NM, requests a resource decrease to NM to decrease it to 4G 5. The decrease is successful and RM gets the notification, and updates the container resource to 4G 6. Before the token expires, the AM requests the resource increase to NM 7. RM receives the resource increase message (8G) from node update. However, the current resource allocation of this container is 4G, which is different than 8G, RM will NOT consider this increase as valid. It unregister the increase request from the timer, and sets the current resource allocation (4G) to the node heartbeat response, which will be pulled by NM in the next heartbeat cycle. 8. Once NM receives the 4G resource allocation, it will monitor and enforce using the 4G value. We have finished updating the design doc and have attached it to this thread (YARN-1197_Design.pdf). Many thanks to [~wangda] for the original design doc, which really helped us to catch up with all the discussion as quickly as we can. We hope to get your valuable feedback soon. We think most of the sub-tasks are still good as outlined in the updated design doc. Once we get approval of the design from the community, we will start the implementation. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197_Design.pdf, mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated YARN-1197: - Assignee: (was: Jeff Zhang) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Assignee: (was: Wangda Tan) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: yarn-1197-scheduler-v1.pdf Attached scheduler design doc for increasing and decreasing, I've uploaded a very draft preview patch for scheduler changes in YARN-1502 Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: yarn-1197-v5.pdf I attached an updated design doc according to our discussion. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Description: (was: Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. ) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Description: The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: yarn-server-resourcemanager.patch.ver.1 yarn-server-nodemanager.patch.ver.1 yarn-server-common.patch.ver.1 yarn-pb-impl.patch.ver.1 yarn-api-protocol.patch.ver.1 tools-project.patch.ver.1 mapreduce-project.patch.ver.1 I just finished container resource increase support including PB/API changes, make capacity scheduler support increasing and NM can support change monitoring size of a running container. *I splitted it to several patches for easier review,* * API/pb file changes in hadoop-yarn-api * PB implementations in hadoop-yarn-common * yarn-server-common changes * yarn-server-resourcemanager changes include capacity scheduler and AMS master, etc. changes * yarn-server-nodemanager changes include ContainerManagerImpl and ContainersMonitor changes * other related project changes according to updated APIs (map-reduce/tools) Aboves a preview patches, still very rough, [~bikassaha], [~sandyr], [~tucu00] , [~vinodkv] could you please do some review on them, I'm eager for your ideas! *And some short notes for current implementations on RM/NM not covered in design doc,* *1) Implementation in capacity scheduler for increasing a container size* It's very close to allocate a new container, some details, * Increase request can be only valid when asked size larger than existed resource, and container state is either RUNNING or ACQUIRED * The entry point of increase request allocation is still in CapacityScheduler:nodeUpdate() * When increase request cannot be allocated, it will also be reserved. Each node can only reserve at most one request (increase request or new container request). I created a new method isReserved() in FiCaSchedulerNode to make scheduler/queue identify if a node is reserved * The major logic for increase request allocation is also placed in LeafQueue:assignContainers, increase requests will be proceeded before new container request. * Queue(leaf/parent) capacity and user capacity checking will also be done before reserve or allocate a increase request * Queue(leaf/parent) used resource will also be deduct when increase request reserved * Users may submit increase request several times on a same container with different size. ** If asked size is equal to previous asked size, it will be ignored ** If asked size is smaller or equal to existed size, this will cancel increase request on this container ** If asked size is different of previous asked size, and greater than existing size, it will replace previous ask and cancel previous reservations (if existed). *2) Implementation in node manager for increasing a container size* * It will do a similar check logic (like token verifications, etc.) like start container * Increase logic will only valid when ContainerState(The internal ContainerState: org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerState) is RUNNING, to avoid a racing condition. * ContainersMonitorImpl will put change requests to containersToBeChanged when received CHANGE_MONITORING_CONTAINER event. And it will be proceeded in MonitoringThread:run() Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: yarn-1197-v4.pdf I updated design doc mainly based on [~bikassaha] latest comment, I left Add implementation to NM to support changing resource isolation limitation of a container blank in design doc. From my understanding, currently, LCE doesn't do any memory isolation (right?), it will setup CPU share when it setup a container. Currently I don't have whole picture of these component, solutions I can think are, 1) Because this is a soft limitation, we just don't do anything in LCE side 2) Create a new interface in ContainerExecutor, which can update resource of a container 3) Disable increase/shrink when LCE is selected Hope to get your more opinions on this and design doc. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Wangda Tan Attachments: yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: yarn-1197-v3.pdf Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Wangda Tan Attachments: yarn-1197.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: yarn-1197-v2.pdf Guys, I attached an updated doc basic on our discussion, mainly focused workflow diagram and detailed API changes, many thanks to [~bikassaha], [~sandyr] and [~tucu00]. Hope to get your feedback. I'll start working on it. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf, yarn-1197-v2.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1197: - Attachment: yarn-1197.pdf Added a initial proposal for it, include increase/decrease a aquired or running container, hope anybody can help me review it. Then we can move forward to break down tasks and start work on it. Thanks. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1197: - Summary: Support changing resources of an allocated container (was: Support increasing resources of an allocated container) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1197: - Assignee: Tan, Wangda Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Tan, Wangda Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1197: - Assignee: Wangda Tan (was: Tan, Wangda) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Assignee: Wangda Tan Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1197: - Assignee: (was: Wangda Tan) Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira