[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142161#comment-15142161 ] Hudson commented on YARN-4138: -- SUCCESS: Integrated in Hadoop-trunk-Commit #9280 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9280/]) YARN-4138. Roll back container resource allocation after resource (jianhe: rev d16b17b4d299b4d58f879a2a15708bacd0938685) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/AllocationExpirationInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerResizing.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/ContainerAllocationExpirer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerNMDoneChangeResourceEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/ContainerExpiredSchedulerEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestIncreaseAllocationExpirer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Fix For: 2.8.0 > > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch, YARN-4138.6.patch, YARN-4138.7.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142052#comment-15142052 ] sandflee commented on YARN-4138: looks good to me too, thanks [~mding] > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch, YARN-4138.6.patch, YARN-4138.7.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140973#comment-15140973 ] Jian He commented on YARN-4138: --- patch looks good to me, [~sandflee], do you have any more comments ? > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch, YARN-4138.6.patch, YARN-4138.7.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137639#comment-15137639 ] MENG DING commented on YARN-4138: - The checkstyle warnings are not fixable. The failed tests are not related to this issue. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch, YARN-4138.6.patch, YARN-4138.7.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137553#comment-15137553 ] Hadoop QA commented on YARN-4138: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 2 new + 235 unchanged - 2 fixed = 237 total (was 237) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 14s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 14s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 160m 6s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_95 Failed junit tests |
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135286#comment-15135286 ] Hadoop QA commented on YARN-4138: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 2 new + 233 unchanged - 2 fixed = 235 total (was 235) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 4s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 53s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 144m 21s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | hadoop.yarn.serv
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134926#comment-15134926 ] MENG DING commented on YARN-4138: - Hi, [~jianhe] and [~sandflee] After more thoughts, I think we should be able to update last confirmed resource every time we get a report of increased containers from NM, like the following: {code} if (rmContainerResource == nmContainerResource) { lastConfirmedResource = nmContainerResource containerAllocationExpirer.unregister } else if (rmContainerResource < nmContainerResource) { // sandflee's use case lastConfirmedResource = rmContainerResource containerAllocationExpirer.unregister handle(RMNodeDecreaseContainerEvent) } else if (nmContainerResource < rmContainerResource ) { // consecutive increase use case lastConfirmedResource = max(nmContainerResource, lastConfirmedResource ) } {code} > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134878#comment-15134878 ] MENG DING commented on YARN-4138: - The difference between the two allocationExpirationInfo is that the second one resets the start time for timeout. But honestly, you are right that there is really no harm done if we set lastConfirmedResource as nmContainerResource when rmContainerResource > nmContainerContainer. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133758#comment-15133758 ] sandflee commented on YARN-4138: to simple the race condition process, could we reject the increase/decrease request if a increase notifiy is not sended by NM? > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133690#comment-15133690 ] Jian He commented on YARN-4138: --- bq. the second allocation expiration info overwrites the first, The two allocationExpirationEnfo are the same(containerId, isIncrease) regardless whether overwrites or not. I'm more worried if we set the lastConfirm to be nmContainerResource if rmContainerResource > nmContainerContainer, what kind of downsides it may cause ? > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133627#comment-15133627 ] sandflee commented on YARN-4138: Hi, [~mding], there may some cases not user/app error, 1, increase resource from 1G to 5G 2, get a 5G token and send it to NM 3, before container increased msg come to RM, AM decrease resource from 5G -> 4G, (last confirmed is 1G) 4, container increase msg come to RM, in current logic, we don't confirm it, just send a decrease msg to NM 5, container increase timer expire, the resource is rolled back to 1G in 4, maybe We could set lastConfirmed to 4G. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132466#comment-15132466 ] MENG DING commented on YARN-4138: - Hi, [~jianhe] I think with a bit of explanation this won't cause confusion. The key message here is that if a user issues two increase requests in a row, but does not use the latest token eventually, we consider this a user/app error, because we don't really understand what the user wants. From Resource Manager's perspective, at any time, a {{ContainerAllocationExpirer}} can only track one increase allocation. If two consecutive increase allocations are made, the second allocation expiration info overwrites the first, which effectively *cancels* the first allocation. Let me know if you have further concerns. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131755#comment-15131755 ] Jian He commented on YARN-4138: --- bq. We only confirm resource when NM reported resource is the same as RM resource. thanks for the explanation. I wonder why the decision was made to reset to the initial resource, in this case, the first increase happened successfully from app's point of view, will this confuse the apps if the resource somehow decrease back to the initial resource. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131201#comment-15131201 ] MENG DING commented on YARN-4138: - The failed tests are not related. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch, > YARN-4138.5.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131157#comment-15131157 ] Hadoop QA commented on YARN-4138: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 1 new + 234 unchanged - 2 fixed = 235 total (was 236) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 52s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 13s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 144m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_91 Failed junit tests |
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130206#comment-15130206 ] Jian He commented on YARN-4138: --- Patch looks good to me overall, one question for this test case: After step 6, rmContainer.getLastConfirmedResource() will return 3G, when the expire event gets triggered, won't it reset it back to 3G? {code} /** * 1. Allocate 1 container: containerId2 (1G) * 2. Increase resource of containerId2: 1G -> 3G * 3. AM acquires the token * 4. Increase resource of containerId2 again: 3G -> 6G * 5. AM acquires the token * 6. AM uses the 1st token to increase the container in NM to 3G * 7. AM does NOT use the second token * 8. Verify containerId2 eventually uses 1G after token expires {code} - I think RMContainerImpl will not receive EXPIRE event at RUNNING state after this patch ? if so, we can remove this. {code} .addTransition(RMContainerState.RUNNING, RMContainerState.RUNNING, RMContainerEventType.EXPIRE) {code} > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch, YARN-4138.4.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129395#comment-15129395 ] Hadoop QA commented on YARN-4138: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 3 new + 233 unchanged - 3 fixed = 236 total (was 236) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 46s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 37s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 202m 28s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.serv
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072880#comment-15072880 ] MENG DING commented on YARN-4138: - Hi, [~sandflee] I think this issue depends on YARN-4519. Will suspend this ticket until YARN-4519 is resolved. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072480#comment-15072480 ] MENG DING commented on YARN-4138: - You are right, I remembered that wrong. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072430#comment-15072430 ] sandflee commented on YARN-4138: when release containers , we didn't hold SchedulerApp's lock. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072426#comment-15072426 ] MENG DING commented on YARN-4138: - Release containers may have the same issue too. Strange that there has been no reports from the field so far? Looks like we need to implement a pending release/decrease list in the scheduler ... > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072414#comment-15072414 ] Jian He commented on YARN-4138: --- I think it may be true that this will lead to dead lock. - CapacityScheduler#allocateContainersToNode will grab scheduler lock and then SchedulerApp's lock at LeafQueue#assignContainers. - CapacityScheduler#rollbackContainerResource first acquires SchedulerApp's lock and then scheduler lock. -- This will also happen when AM calls CapacityScheduler#allocate to decrease the container. This is introduced in YARN-1651. I had a [comment|https://issues.apache.org/jira/browse/YARN-1651?focusedCommentId=14738568&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14738568] earlier that every AM allocate call will hold scheduler and queue's lock,which is too expensive, but missed that this may lead to deadlock. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072314#comment-15072314 ] sandflee commented on YARN-4138: Hi, [~mding], I'll open a new jira to track this, not to delaying this issue. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071684#comment-15071684 ] MENG DING commented on YARN-4138: - Hi, [~sandflee] Can you provide a test case if you believe there is a deadlock? > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071682#comment-15071682 ] MENG DING commented on YARN-4138: - Hi, [~sandflee] In your step 3, the container will NOT be removed from allocation expirer. Please see the following code in the patch: {code} +// Only unregister from the containerAllocationExpirer when target +// resource is less than or equal to the last confirmed resource. +if (Resources.fitsIn(targetResource, lastConfirmedResource)) { + container.lastConfirmedResource = targetResource; + container.containerAllocationExpirer.unregister( + new AllocationExpirationInfo(event.getContainerId())); +} {code} > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071492#comment-15071492 ] sandflee commented on YARN-4138: there seems a deadlock, in allocate and rollback logic we get schedulerApp's lock and then capacityScheduler's lock (decreaseContainers), while in scheduler thread, we get capacitySchedulers's lock first, and maybe get schedulerApp's lock. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071475#comment-15071475 ] sandflee commented on YARN-4138: + decreaseRequest = new SchedContainerChangeRequest( + schedulerNode, rmContainer, rmContainer.getLastConfirmedResource()); we use lastConfimedResource and allocatedResource, but not protected by RMContainer's Lock , they may be changed in different dispatcher thread. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071403#comment-15071403 ] sandflee commented on YARN-4138: Hi, [~mding], sorry for the late reply, 1, If AM send tokenA succ to NM, and before NM send containerIncreased msg to RM, AM request resource from 2G -> 4G, get tokenB but never used, after expire, RM asks AM to rollback to 1G, seems not the proper action. seems [~jianhe] has the same concern. {quote} 1. A container is using 2G 2. AM requests to increase it from 2G --> 8G, and scheduler allocates it and issues token to AM 3. AM never uses the token, but requests to decrease the container from 8G --> 6G, and scheduler goes ahead and decrease the resource to 6G, and also removes the container from allocation expirer 4. RM notifies NM to decrease resource to 6G, but since NM is still using 2G, the decrease message is ignored by NM 5. Now the container has 6G allocation in RM, but 2G allocation in NM. {quote} in 4, could we send a set msg to NM instead of a decrease msg ? 2, there are race conditions between AM get a new token and send another resource change request, I opened a new issues to discuss this (YARN-4495) > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071080#comment-15071080 ] MENG DING commented on YARN-4138: - Hi, [~jianhe] I just tried to apply the patch on the latest trunk, and it seemed ok: {code} vagrant@mdinglin02:~/workspace/hadoop-test$ git status On branch trunk Your branch is up-to-date with 'origin/trunk'. nothing to commit, working directory clean vagrant@mdinglin02:~/workspace/hadoop-test$ git apply -p0 /vagrant/YARN-4138.3.patch vagrant@mdinglin02:~/workspace/hadoop-test$ git status On branch trunk Your branch is up-to-date with 'origin/trunk'. Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/ContainerAllocationExpirer.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/ContainerExpiredSchedulerEvent.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/TestRMContainerImpl.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerResizing.java modified: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java Untracked files: (use "git add ..." to include in what will be committed) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/AllocationExpirationInfo.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestIncreaseAllocationExpirer.java no changes added to commit (use "git add" and/or "git commit -a") {code} > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070594#comment-15070594 ] Jian He commented on YARN-4138: --- sorry, my bad. I don't why the AllocationExpirationInfo.java previously ended up in hadoop-yarn/ directory. Seems the patch not applying on trunk any more.. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070534#comment-15070534 ] MENG DING commented on YARN-4138: - Hi [~jianhe], which file(s) are you referring to in particular? > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068934#comment-15068934 ] Jian He commented on YARN-4138: --- looks like some files are created at a wrong direcotry "hadoop-yarn/" instead of hadoop-yarn-project > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, > YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064880#comment-15064880 ] Hadoop QA commented on YARN-4138: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 249, now 249). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 47s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 42s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 157m 32s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12778547/YARN-4138.3
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064134#comment-15064134 ] MENG DING commented on YARN-4138: - Hi, [~sandflee] 1. Yes, this is the expected behavior. If you take a look at the discussion from the beginning of this thread, we have decided that if multiple increase tokens are granted by RM in a row for a container before AM uses any of the token, the last token will take effect, and any previous tokens will be effectively cancelled. If RM sees a difference between its own number and the number reported from NM, it will consider it as an *unconfirmed* state, and won't set the lastConfirmed value. Besides, if AM issues multiple increase requests, but doesn't use the last token, it is considered an user error. 2. If I understand your question correctly, then you are right that you should pass container B. In fact, the container B you are talking about is technically still container A, as uniquely identified by the container ID. When resource increase request of container A is granted by RM, RM still sends back container A, but with updated resource and token. As an Application Master developer, you are expected to track all live containers in AM, and in the onContainersResourceChanged(List changedContainers) callback function, you need to replace the original container A with the updated container A. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063576#comment-15063576 ] sandflee commented on YARN-4138: {quote} We should not update lastConfirmedResource in this scenario. This is the exact case we want to cover in this ticket, where the resource increase token may expire, and we need to roll back to the old resource. The only time we want to update lastConfirmedResource during resource increase is when Resources.equals(nmContainerResource, rmContainerResource). {quote} 1, AM increase resource from 1G -> 2G, get container tokenA 2, AM increase from 2G -> 4G,get container tokenB 3, AM send tokenA to NM, and never send tokenB 4, RM will decease resource to 1G and this is the expeceted?seems we could set lastConfirmed resource to 2G {quote} Sorry I don't understand the question. Can you elaborate? {quote} when I using requestContainerResourceChange(Container container, Resource capability), I have a question about which container we should pass to this function. 1, AM request resource and get a container A 2, AM send increase request and get a container B 3, when AM wants to decrease resource, shoud we pass container A or B to requestContainerResourceChange? seems we shoud pass container B, even if container B is not send to NM? > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063209#comment-15063209 ] MENG DING commented on YARN-4138: - Thanks [~sandflee] for the review. bq. use Resources.fitsin(targetResource, lastConfirmedResource)? Will do bq. update lastConfirmedResource in RMContainer? and log debug to log info? We should not update lastConfirmedResource in this scenario. This is the exact case we want to cover in this ticket, where the resource increase token may expire, and we need to roll back to the old resource. The only time we want to update lastConfirmedResource during resource increase is when Resources.equals(nmContainerResource, rmContainerResource). bq. If am increase a containerA 1G -> 2G, and recieved a new container B, and have not told NM if am wants to decrease it to 500M, when using requestContainerResourceChange(Container container, Resource capability) , seems we should use container B? Sorry I don't understand the question. Can you elaborate? Thanks, Meng > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063066#comment-15063066 ] MENG DING commented on YARN-4138: - Hi, [~jianhe] Thanks for reviewing the code. * I think you are right that there could be a race condition where rmContainer.getLastConfirmedResource() (called first) is 2G, and the rmContainer.getAllocatedResource() (called next) becomes 1G, causing the resource delta to become positive. I think the solution is to synchronize the following block, such that both rmContainer.getLastConfirmedResource() and rmContainer.getAllocatedResource() will be 1G, so the resource delta is 0, and the decreaseContainer call will be skipped. {code} SchedContainerChangeRequest decreaseRequest = new SchedContainerChangeRequest( schedulerNode, rmContainer, rmContainer.getLastConfirmedResource()); decreaseContainer(decreaseRequest, getCurrentAttemptForContainer(containerId)); {code} * I don't quite understand the concern about the API semantics. If the above is fixed, is the API semantics still a concern to you? * bq. revert format only changes in RMContainerChangeResourceEvent Will do > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059777#comment-15059777 ] sandflee commented on YARN-4138: 1, use Resources.fitsin(targetResource, lastConfirmedResource)? {code:title=RMContainerImpl.java|borderStyle=solid} +if (targetResource.getMemory() <= lastConfirmedResource.getMemory() +&& targetResource.getVirtualCores() <= lastConfirmedResource +.getVirtualCores()) { + container.lastConfirmedResource = targetResource; + container.containerAllocationExpirer.unregister( + new AllocationExpirationInfo(event.getContainerId())); +} {code} 2, update lastConfirmedResource in RMContainer? and log debug to log info? {code:title=AbstractYarnScheduler.java |borderStyle=solid} // when rmContainerResource >= nmContainerResource, we won't do anything, // it is possible a container increased is issued by RM, but AM hasn't // told NM. + LOG.debug("Container size reported by NM: " + nmContainerResource + + " is smaller than allocated size in RM: " + rmContainerResource + + ". ContainerID=" + containerId + ". Ignore."); {code} 3, If am increase a containerA 1G -> 2G, and recieved a new container B, and have not told NM if am wants to decrease it to 500M, when using requestContainerResourceChange(Container container, Resource capability) , seems we should use container B? > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057229#comment-15057229 ] sandflee commented on YARN-4138: got it, thanks for your explain! > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057106#comment-15057106 ] Jian He commented on YARN-4138: --- {code} SchedContainerChangeRequest decreaseRequest = new SchedContainerChangeRequest( schedulerNode, rmContainer, rmContainer.getLastConfirmedResource()); decreaseContainer(decreaseRequest, getCurrentAttemptForContainer(containerId)); {code} - this scenario may cause resource accounting wrong, correct me if I'm wrong: 1) AM asks increase 2G -> 8G 2) AM does not increase the container, and asks decrease to 1G 3) LastConfirmedResource becomes 1G 4) In the meantime, containerIncreaseExpiration logic is triggered and rollbackContainerResource is invoked. In this case the resource Delta becomes positive even in decrease case, but some code is assuming the decrease to be negative, which may cause resource accounting wrong ? {code} // Delta capacity is negative when it's a decrease request Resource absDelta = Resources.negate(decreaseRequest.getDeltaCapacity()); {code} - I have a question about the API semantics for the above mentioned scenario. According to the AMRMClient#requestContainerResourceChange API, the previous pending resource-change-request should be cancelled. Essentially, the semantics is a setter API. In that sense, the previous 8G should be cancelled. With this approach, both resource-change-requests are cancelled. That is,10 min later after the expiration is triggered, user will suddenly see its container decreased to 2 GB. will this confuse the user ? - revert RMContainerChangeResourceEvent > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056056#comment-15056056 ] MENG DING commented on YARN-4138: - Hi, [~sandflee] The proposed implementation of the token expiration and resource allocation rollback is effectively the same as resource allocation decrease. When the resource allocation of a container is decreased in RM, the AM will be notified in the next AM-RM heartbeat response. So AM should have a consistent view of the resource allocation eventually. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055330#comment-15055330 ] sandflee commented on YARN-4138: Hi, [~mding], consider such situation: 1) AM request increase request to RM and received a token. 2) AM sends increase request to NM and succed (this may take a long time because of AM logic or network problem or nm restart) 3) RM token expires, and send decrease action to NM 4) NM sends increase notify to RM, since token expires, ignored by RM to AM, since there's no increase succ/expire by RM, resource are successfully increased, but to RM and NM, increase action are expired > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052890#comment-15052890 ] MENG DING commented on YARN-4138: - Hi, [~sandflee] Not sure if I fully understand your question. If resource is successfully increased in NM, NM will report the increase to RM in the next heartbeat, so there will be no token expiration. For token expiration to occur, the AM needs to acquire the increase token, and to NOT call the NMClient.increaseContainerResource API. When RM rolls back the resource allocation (implemented in this patch), it follows the same logic of decrease resource allocation. When that is done, the AM should get a notification from the heartbeat response for resource decrease. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052468#comment-15052468 ] sandflee commented on YARN-4138: if AM increase container size successful in NM, but resource increase token expired in RM, NM container size will be decreased to original, but to AM, the container size increased successed, seems token expire msg should delivered to AM. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch, YARN-4138-YARN-1197.2.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14875803#comment-14875803 ] Hadoop QA commented on YARN-4138: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 12s | Findbugs (version ) appears to be broken on YARN-1197. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 8m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 29s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 5s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 44s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 44s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 42m 54s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 85m 39s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices | | | hadoop.yarn.server.resourcemanager.resourcetracker.TestRMNMRPCResponseId | | | hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl | | | hadoop.yarn.server.resourcemanager.TestApplicationCleanup | | | hadoop.yarn.server.resourcemanager.TestRMHA | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.TestResourceManager | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterService | | | hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels | | | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel | | | hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.resourcetracker.TestNMReconnect | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppAttempt | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | | | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate | | | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyForNodePartitions | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestQueueManager | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestMaxRunningAppsEnforcer | | | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | |
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803287#comment-14803287 ] MENG DING commented on YARN-4138: - Hi, [~sunilg] The case you mentioned is covered. When RM rolls back container size to 2G, it does the equivalent of decreasing the container size to 2G, and will subsequently send a decrease message to NM through heartbeat response. NM (currently using 4G per your example), upon receiving the response, will decrease its enforcement to 2G. So eventually the resource view is consistent between RM and NM. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803201#comment-14803201 ] Sunil G commented on YARN-4138: --- HI [~mding] In the same case, if we 2 incremental requests, like 2G --> 4G --> 8G. Is it possible to have a success case where the container size reached 4G at any intermediate time?. Because from RMs point of view, last incremental request is to reach 8GB. In NM, it may have reached 4G already. And here if we get a decrease request to 6G, by this logic RM container size will it fall down to 2G. (But in real, 4G was attained in NM). Kindly let me know if I missed something in this analysis. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803122#comment-14803122 ] MENG DING commented on YARN-4138: - There is an issue with the current logic: {code:title=RMContainerImpl.java} + if (!changeEvent.isIncrease()) { +// if this is a decrease request, if container was increased but not +// told to NM, we can consider previous increase is cancelled, +// unregister from the containerAllocationExpirer +container.containerAllocationExpirer.unregister(container +.getContainerId()); + } {code} Right now, if RM is processing a decrease request on a container, it (intends to) cancel any ongoing increase action on the same container by removing the container from allocation expirer. This is correct if the target resource is less than or equal to the last confirmed resource, otherwise this will cause inconsistencies. For example: 1. A container is using 2G 2. AM requests to increase it from 2G --> 8G, and scheduler allocates it and issues token to AM 3. AM never uses the token, but requests to decrease the container from 8G --> 6G, and scheduler goes ahead and decrease the resource to 6G, and also removes the container from allocation expirer 4. RM notifies NM to decrease resource to 6G, but since NM is still using 2G, the decrease message is ignored by NM 5. Now the container has 6G allocation in RM, but 2G allocation in NM. In this ticket, we will add a last confirmed resource to RMContainer, and I propose to only unregister the container from expirer when the target resource is less than or equal to the last confirmed resource. Use the above example, after the fix, the behavior should be: 1. A container is using 2G 2. AM requests to increase it from 2G --> 8G, and scheduler allocates it and issues token to AM 3. AM requests to decrease the container from 8G --> 6G. Scheduler decreases it 6G, but does *not* remove the container from allocation expirer 4. The increase token expires, and scheduler reverts back the container resource from 6G to 2G. Let me know if this makes sense or not. If yes, I will come up with a patch shortly. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: MENG DING > Attachments: YARN-4138-YARN-1197.1.patch > > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745697#comment-14745697 ] Sunil G commented on YARN-4138: --- Synced offline. Assigning to [~mding] as he already has a WIP patch. Will help in reviewing the same :) > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: Sunil G > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737260#comment-14737260 ] Sunil G commented on YARN-4138: --- Hi [~mding] Thank you. Yes. I was also thinking in same line. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: Sunil G > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737229#comment-14737229 ] MENG DING commented on YARN-4138: - bq. if multiple requests comes for same container, and that increase token expires, do we need to set back the very first container resource. I feel we can reset to the first container increase request's start resource unit. I think we need to set back to the original resource before the first increase request. Is this what you think, or you have different idea? > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: Sunil G > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737197#comment-14737197 ] Sunil G commented on YARN-4138: --- Thank you very much [~mding] for sharing the update. I am following YARN-1651 also. As updated above, ContainerResourceIncreaseAllocationExpirer has to register and monitor for every container which are asked to increase container request (storing old resource). One doubt is, if multiple requests comes for same container, and that increase token expires, do we need to set back the very first container resource. I feel we can reset to the first container increase request's start resource unit. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: Sunil G > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737160#comment-14737160 ] MENG DING commented on YARN-4138: - Hi, [~sunilg], The following is a summary of my discussion with [~leftnoteasy] on this issue so far (from YARN-1651) to keep you up-to-date. Let me know if you have any questions or comments. * We will create a new monitor (e.g., ContainerResourceIncreaseAllocationExpirer) to track token expiration for container resource increase. It must remember both container ID, and the *last confirmed* resource to revert back to. * Multiple container resource increase requests for the same container can happen in a row. Later increase request will reset the expiration timeout of previous requests for the same container in the monitor. * When a container increase token expires, not only must RM revert back the resource, it also needs to send a decrease container event to NM to make sure that the resource allocation is consistent between RM and NM. This is particularly important in the situation where multiple container resource increase tokens are acquired by AM, but AM doesn't use the latest token to initiate the increase action on NM. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: Sunil G > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737089#comment-14737089 ] MENG DING commented on YARN-4138: - Thanks [~sunilg] for taking this. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: Sunil G > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires
[ https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737059#comment-14737059 ] Sunil G commented on YARN-4138: --- Hi [~mding] If you are not working on this, I would like to take it up. > Roll back container resource allocation after resource increase token expires > - > > Key: YARN-4138 > URL: https://issues.apache.org/jira/browse/YARN-4138 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, nodemanager, resourcemanager >Reporter: MENG DING >Assignee: Sunil G > > In YARN-1651, after container resource increase token expires, the running > container is killed. > This ticket will change the behavior such that when a container resource > increase token expires, the resource allocation of the container will be > reverted back to the value before the increase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)