[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261187#comment-15261187 ] Hitesh Shah commented on YARN-4844: --- bq. Per my understanding, changing from int to long won't affect downstream project a lot, it's an error which can be captured by compiler directly. And getMemory/getVCores should not be used intensively by downstream project. For example, MR uses only ~20 times of getMemory()/VCores for non-testing code. Which can be easily fixed. If you are going to force downstream apps to change, I dont understand why you are not forcing them to do this in the first 3.0.0 release? What benefit does this provide anyone by delaying it to some later 3.x.y release? It just means that you have do the production stability verification of upstream apps all over again. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch, YARN-4844.3.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257158#comment-15257158 ] Wangda Tan commented on YARN-4844: -- bq. ...Given the debate about the extent of the changes we want to make, can we put a patch that changes the int32 to int64, adds getMemoryLong with a Private annotation(so that we can make changes later if we wish) and only fixes the pending memory check that was added in 2.8?... I agree size of the patch looks scary :-p, however, if you look into the patch, they're all very simple fixes, I don't think it will cause a lot of issues. You may feel better once I fixed all Jenkins issues. I have considered fix the pending resource calculation only, it looks hard to me. Because calculation of pending resource uses ResourceCalculator/ResourceUsage. And ResourceCalculator and related static methods of Resources used everywhere in RM. It's a good idea to me to mark get___Long to @Private, currently pending resource hasn't been exposed to application via Java API yet. Now it is only exposed in REST API which is fixed by the patch already. Thoughts? > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257142#comment-15257142 ] Wangda Tan commented on YARN-4844: -- bq. So the plan is to force users to change their usage of these APIs in some version of 3.x but not in 3.0.0 ? Regardless of debates about the first release of 3.x, let's assume it happens soon. The plan in my mind is to make sure incompatible API changes are get in when 3.x enters beta releases. We have a couple of other API changes on the way, in YARN for example, ATSv2, new web UI, etc. bq. Additionally we are not talking about use in production but rather making upstream apps change as needed to work with 3.x and over time stabilize 3.x. Per my understanding, changing from int to long won't affect downstream project a lot, it's an error which can be captured by compiler directly. And getMemory/getVCores should not be used intensively by downstream project. For example, MR uses only ~20 times of getMemory()/VCores for non-testing code. Which can be easily fixed. bq. Making an API change earlier rather than later is actually better as the API changes in this case have no relevance to production stability. I agree that it won't affect production stability. However, it adds additional overhead to development works which I don't want. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256606#comment-15256606 ] Varun Vasudev commented on YARN-4844: - I'm a little concerned with the size of the changes required, coming in so close to the 2.8 release. Given the debate about the extent of the changes we want to make, can we put a patch that changes the int32 to int64, adds getMemoryLong with a Private annotation(so that we can make changes later if we wish) and only fixes the pending memory check that was added in 2.8? We can do a follow up on how to fix this in the wider community. What do you think [~leftnoteasy]? > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254970#comment-15254970 ] Hitesh Shah commented on YARN-4844: --- Additionally we are not talking about use in production but rather making upstream apps change as needed to work with 3.x and over time stabilize 3.x. Making an API change earlier rather than later is actually better as the API changes in this case have no relevance to production stability. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254967#comment-15254967 ] Hitesh Shah commented on YARN-4844: --- bq. considering there are hundreds of blockers and criticals of 3.0.0 release, nobody will actually use the new release in production even if 3.0-alpha can be released. We can mark Resource API of trunk to be unstable and update it in future 3.x releases. So the plan is to force users to change their usage of these APIs in some version of 3.x but not in 3.0.0 ? > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254944#comment-15254944 ] Hadoop QA commented on YARN-4844: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 53 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 46s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 22s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 3s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 7s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 50s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 61 new + 1404 unchanged - 47 fixed = 1465 total (was 1451) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 47s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 55s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} |
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254897#comment-15254897 ] Wangda Tan commented on YARN-4844: -- [~hitesh], considering there are hundreds of blockers and criticals of 3.0.0 release, nobody will actually use the new release in production even if 3.0-alpha can be released. We can mark Resource API of trunk to be unstable and update it in future 3.x releases. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253472#comment-15253472 ] Hadoop QA commented on YARN-4844: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 53 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 1s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 4m 25s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_77 with JDK v1.8.0_77 generated 1 new + 2 unchanged - 1 fixed = 3 total (was 3) {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 48s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 61 new + 1408 unchanged - 47 fixed = 1469 total (was 1455) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 48s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 54s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253084#comment-15253084 ] Hitesh Shah commented on YARN-4844: --- bq. It is not a very hard thing to drop it, we'd better to do it close to first branch-3 release. I believe a recent comment on the mailing list was trying to target a 3.0 release within the next few weeks so I guess that means we make this change now? > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253048#comment-15253048 ] Wangda Tan commented on YARN-4844: -- [~hitesh], Agree it is very messy but it seems there's no other way to do it. :( I also tried to add new Resource object (like YarnServerResource) which will be used by YARN internally only and keeps user-facing API clean. But there're too many interactions between application and services use api.Resource, we need to handle all these cases separately, it doesn't look like a doable plan to me. For branch-3 release, I prefer to keep {{long getMemory/etc}} only. However, since getMemory is used 1000+ times inside resource manager project, if we drop the {{Resource#int getMemory}} now from trunk, we need to write two versions of patches for almost RM fixes, which will be a HUGE headache for YARN RM contributors. It is not a very hard thing to drop it, we'd better to do it close to first branch-3 release. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253028#comment-15253028 ] Hitesh Shah commented on YARN-4844: --- getMemoryLong(), etc just seems messy. I can understand why this is needed on branch-2 if we need to support long but for trunk, it seems better to change getMemory() to return a long. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253005#comment-15253005 ] Hadoop QA commented on YARN-4844: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} YARN-4844 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12800125/YARN-4844.1.patch | | JIRA Issue | YARN-4844 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11170/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253004#comment-15253004 ] Wangda Tan commented on YARN-4844: -- An additional note about why it is a 2.8 blocker: Currently Capacity Scheduler relies on total pending resource: When trying to assign container for each node heartbeat from root queue, it skips queue which has <= 0 pending resources. So if memory pendingResource overflows, no more containers can be allocated. branch-2.7 will not affected since the new logic is only in branch-2.8. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252976#comment-15252976 ] Wangda Tan commented on YARN-4844: -- Discussed with [~vinodkv], [~jianhe], [~hitesh] about this issue. A good news is, Google PB has backward/forward compatibility for all int_ fields, see: https://developers.google.com/protocol-buffers/docs/proto#updating: bq. int32, uint32, int64, uint64, and bool are all compatible – this means you can change a field from one of these types to another without breaking forwards- or backwards-compatibility. If a number is parsed from the wire which doesn't fit in the corresponding type, you will get the same effect as if you had cast the number to that type in C++ (e.g. if a 64-bit number is read as an int32, it will be truncated to 32 bits). So we have no problem to change ResourceProto from int32 to int64. In addition to .proto change, following changes are required for API record : Resource. - Update {{set_(int ...)}} to {{set_(long ...)}}, there's no compatible issue for setters - Add {{getMemoryLong}} and {{getVirtualCoresLong}} method And also, we need update Metrics objects related to Resources, such as QueueMetrics, etc. AFAIK, there's no compatibility issue. The last part is scheduler and test fixes. Attached ver.1 patch for review. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)