[jira] [Commented] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated
[ https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497894#comment-17497894 ] Hadoop QA commented on YARN-11082: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red}{color} | {color:red} YARN-11082 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-11082 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13040460/YARN-11082.001.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1274/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Use node label reosurce as denominator to decide which resource is dominated > - > > Key: YARN-11082 > URL: https://issues.apache.org/jira/browse/YARN-11082 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.1 >Reporter: Bo Li >Priority: Major > Fix For: 3.1.1 > > Attachments: YARN-11082.001.patch > > > We ued cluster resource as denominator to decide which resoure is dominated > in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are > configed differently. > {quote}2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application > attempt=appattempt_1637412555366_1588993_01 container=null > queue=root.a.a1.a2 clusterResource= > type=RACK_LOCAL requestedPartition=xx > 2021-12-09 10:24:37,069 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: > Used resource= exceeded maxResourceLimit of the > queue = > 2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal > {quote} > We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the > following code in AbstrctQueue#canAssignToThisQueue still return false > {quote} > Resources.greaterThanOrEqual(resourceCalculator, clusterResource, > usedExceptKillable, currentLimitResource) > {quote} > clusterResource = > usedExceptKillable = > currentLimitResource = > currentLimitResource: > memory : 3381248/175117312 = 0.01930847362 > vCores : 687/40222 = 0.01708020486 > usedExceptKillable: > memory : 3384320/175117312 = 0.01932601615 > vCores : 688/40222 = 0.01710506687 > DRF will think memory is dominated resource and return false in this scenario -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated
[ https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Li updated YARN-11082: - Attachment: YARN-11082.001.patch > Use node label reosurce as denominator to decide which resource is dominated > - > > Key: YARN-11082 > URL: https://issues.apache.org/jira/browse/YARN-11082 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.1 >Reporter: Bo Li >Priority: Major > Fix For: 3.1.1 > > Attachments: YARN-11082.001.patch > > > We ued cluster resource as denominator to decide which resoure is dominated > in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are > configed differently. > {quote}2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application > attempt=appattempt_1637412555366_1588993_01 container=null > queue=root.a.a1.a2 clusterResource= > type=RACK_LOCAL requestedPartition=xx > 2021-12-09 10:24:37,069 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: > Used resource= exceeded maxResourceLimit of the > queue = > 2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal > {quote} > We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the > following code in AbstrctQueue#canAssignToThisQueue still return false > {quote} > Resources.greaterThanOrEqual(resourceCalculator, clusterResource, > usedExceptKillable, currentLimitResource) > {quote} > clusterResource = > usedExceptKillable = > currentLimitResource = > currentLimitResource: > memory : 3381248/175117312 = 0.01930847362 > vCores : 687/40222 = 0.01708020486 > usedExceptKillable: > memory : 3384320/175117312 = 0.01932601615 > vCores : 688/40222 = 0.01710506687 > DRF will think memory is dominated resource and return false in this scenario -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated
[ https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Li updated YARN-11082: - Attachment: (was: YARN-11082.patch) > Use node label reosurce as denominator to decide which resource is dominated > - > > Key: YARN-11082 > URL: https://issues.apache.org/jira/browse/YARN-11082 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.1 >Reporter: Bo Li >Priority: Major > Fix For: 3.1.1 > > Attachments: YARN-11082.001.patch > > > We ued cluster resource as denominator to decide which resoure is dominated > in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are > configed differently. > {quote}2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application > attempt=appattempt_1637412555366_1588993_01 container=null > queue=root.a.a1.a2 clusterResource= > type=RACK_LOCAL requestedPartition=xx > 2021-12-09 10:24:37,069 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: > Used resource= exceeded maxResourceLimit of the > queue = > 2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal > {quote} > We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the > following code in AbstrctQueue#canAssignToThisQueue still return false > {quote} > Resources.greaterThanOrEqual(resourceCalculator, clusterResource, > usedExceptKillable, currentLimitResource) > {quote} > clusterResource = > usedExceptKillable = > currentLimitResource = > currentLimitResource: > memory : 3381248/175117312 = 0.01930847362 > vCores : 687/40222 = 0.01708020486 > usedExceptKillable: > memory : 3384320/175117312 = 0.01932601615 > vCores : 688/40222 = 0.01710506687 > DRF will think memory is dominated resource and return false in this scenario -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated
[ https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Li updated YARN-11082: - Target Version/s: 3.1.1 > Use node label reosurce as denominator to decide which resource is dominated > - > > Key: YARN-11082 > URL: https://issues.apache.org/jira/browse/YARN-11082 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.1 >Reporter: Bo Li >Priority: Major > Fix For: 3.1.1 > > Attachments: YARN-11082.patch > > > We ued cluster resource as denominator to decide which resoure is dominated > in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are > configed differently. > {quote}2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application > attempt=appattempt_1637412555366_1588993_01 container=null > queue=root.a.a1.a2 clusterResource= > type=RACK_LOCAL requestedPartition=xx > 2021-12-09 10:24:37,069 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: > Used resource= exceeded maxResourceLimit of the > queue = > 2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal > {quote} > We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the > following code in AbstrctQueue#canAssignToThisQueue still return false > {quote} > Resources.greaterThanOrEqual(resourceCalculator, clusterResource, > usedExceptKillable, currentLimitResource) > {quote} > clusterResource = > usedExceptKillable = > currentLimitResource = > currentLimitResource: > memory : 3381248/175117312 = 0.01930847362 > vCores : 687/40222 = 0.01708020486 > usedExceptKillable: > memory : 3384320/175117312 = 0.01932601615 > vCores : 688/40222 = 0.01710506687 > DRF will think memory is dominated resource and return false in this scenario -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11068) Update transitive log4j2 dependency to 2.17.1
[ https://issues.apache.org/jira/browse/YARN-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497888#comment-17497888 ] Wei-Chiu Chuang commented on YARN-11068: No ... the solr dependency and its log4j2 transitive dependency was introduced in Hadoop 3.3 by the YARN application catalog feature. It's not applicable to branch-3.2 > Update transitive log4j2 dependency to 2.17.1 > - > > Key: YARN-11068 > URL: https://issues.apache.org/jira/browse/YARN-11068 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Similar to HADOOP-18092, we have transitive log4j2 dependency coming from > solr-core 8 that must be excluded. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11068) Update transitive log4j2 dependency to 2.17.1
[ https://issues.apache.org/jira/browse/YARN-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497473#comment-17497473 ] Brahma Reddy Battula commented on YARN-11068: - [~weichiu] and [~aajisaka] can you merge to branch-3.2.3..? > Update transitive log4j2 dependency to 2.17.1 > - > > Key: YARN-11068 > URL: https://issues.apache.org/jira/browse/YARN-11068 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Similar to HADOOP-18092, we have transitive log4j2 dependency coming from > solr-core 8 that must be excluded. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10701) The yarn.resource-types should support multi types without trimmed.
[ https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10701: -- Description: {code:java} yarn.resource-types yarn.io/gpu, yarn.io/fpga {code} When i configured the resource type above with gpu and fpga, the error happened: {code:java} org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is not a valid resource name. A valid resource name must begin with a letter and contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource name may also be optionally preceded by a name space followed by a slash. A valid name space consists of period-separated groups of letters, numbers, and dashes.{code} The resource types should support trim. was: {code:java} yarn.resource-types yarn.io/gpu, yarn.io/fpga {code} When i configured the resource type above with gpu and fpga, the error happend: {code:java} org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is not a valid resource name. A valid resource name must begin with a letter and contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource name may also be optionally preceded by a name space followed by a slash. A valid name space consists of period-separated groups of letters, numbers, and dashes.{code} The resource types should support trim. > The yarn.resource-types should support multi types without trimmed. > --- > > Key: YARN-10701 > URL: https://issues.apache.org/jira/browse/YARN-10701 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10701-branch-3.3.001.patch, YARN-10701.001.patch, > YARN-10701.002.patch > > > {code:java} > > > yarn.resource-types > yarn.io/gpu, yarn.io/fpga > > {code} > When i configured the resource type above with gpu and fpga, the error > happened: > > {code:java} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is > not a valid resource name. A valid resource name must begin with a letter and > contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource > name may also be optionally preceded by a name space followed by a slash. A > valid name space consists of period-separated groups of letters, numbers, and > dashes.{code} > > The resource types should support trim. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10438) Handle null containerId in ClientRMService#getContainerReport()
[ https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10438: -- Description: Here is the Exception trace which we are seeing, we are suspecting because of this exception RM is reaching in a state where it is no more allowing any new job to run on the cluster. {code:java} 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default port 8032, call Call#1463486 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport from 10.39.91.205:49564 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) {code} We are seeing this issue with this specific node only, we do run this cluster at a scale of around 500 nodes. was: Here is the Exception trace which we are seeing, we are suspecting because of this exception RM is reaching in a state where it is no more allowing any new job to run on the cluster. {noformat} 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default port 8032, call Call#1463486 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport from 10.39.91.205:49564 java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) {noformat} We are seeing this issue with this specific node only, we do run this cluster at a scale of around 500 nodes. > Handle null containerId in ClientRMService#getContainerReport() > --- > > Key: YARN-10438 > URL: https://issues.apache.org/jira/browse/YARN-10438 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Raghvendra Singh >Assignee: Shubham Gupta >Priority: Major > Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2 > > > Here is the Exception trace which we are seeing, we are suspecting because of > this exception RM is reaching in a state where it is no more allowing any new > job to run on the cluster. > {code:java} > 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default > port 8032, call Call#1463486 Retry#0 > org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport > from 10.39.91.205:49564 java.lang.NullPointerException at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at >
[jira] [Commented] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated
[ https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497262#comment-17497262 ] Hadoop QA commented on YARN-11082: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red}{color} | {color:red} YARN-11082 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-11082 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13040419/YARN-11082.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1273/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Use node label reosurce as denominator to decide which resource is dominated > - > > Key: YARN-11082 > URL: https://issues.apache.org/jira/browse/YARN-11082 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.1 >Reporter: Bo Li >Priority: Major > Fix For: 3.1.1 > > Attachments: YARN-11082.patch > > > We ued cluster resource as denominator to decide which resoure is dominated > in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are > configed differently. > {quote}2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application > attempt=appattempt_1637412555366_1588993_01 container=null > queue=root.a.a1.a2 clusterResource= > type=RACK_LOCAL requestedPartition=xx > 2021-12-09 10:24:37,069 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: > Used resource= exceeded maxResourceLimit of the > queue = > 2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal > {quote} > We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the > following code in AbstrctQueue#canAssignToThisQueue still return false > {quote} > Resources.greaterThanOrEqual(resourceCalculator, clusterResource, > usedExceptKillable, currentLimitResource) > {quote} > clusterResource = > usedExceptKillable = > currentLimitResource = > currentLimitResource: > memory : 3381248/175117312 = 0.01930847362 > vCores : 687/40222 = 0.01708020486 > usedExceptKillable: > memory : 3384320/175117312 = 0.01932601615 > vCores : 688/40222 = 0.01710506687 > DRF will think memory is dominated resource and return false in this scenario -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated
[ https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Li updated YARN-11082: - Description: We ued cluster resource as denominator to decide which resoure is dominated in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are configed differently. {quote}2021-12-09 10:24:37,069 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1637412555366_1588993_01 container=null queue=root.a.a1.a2 clusterResource= type=RACK_LOCAL requestedPartition=xx 2021-12-09 10:24:37,069 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Used resource= exceeded maxResourceLimit of the queue = 2021-12-09 10:24:37,069 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal {quote} We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the following code in AbstrctQueue#canAssignToThisQueue still return false {quote} Resources.greaterThanOrEqual(resourceCalculator, clusterResource, usedExceptKillable, currentLimitResource) {quote} clusterResource = usedExceptKillable = currentLimitResource = currentLimitResource: memory : 3381248/175117312 = 0.01930847362 vCores : 687/40222 = 0.01708020486 usedExceptKillable: memory : 3384320/175117312 = 0.01932601615 vCores : 688/40222 = 0.01710506687 DRF will think memory is dominated resource and return false in this scenario was: We ued cluster resource as denominator to decide which resoure is dominated in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are configed differently. {quote} 2021-12-09 10:24:37,069 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1637412555366_1588993_01 container=null queue=root.a.a1.a2 clusterResource= type=RACK_LOCAL requestedPartition=xx 2021-12-09 10:24:37,069 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Used resource= exceeded maxResourceLimit of the queue = 2021-12-09 10:24:37,069 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal {quote} We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the following code in AbstrctQueue#canAssignToThisQueue still return false ```java Resources.greaterThanOrEqual(resourceCalculator, clusterResource, usedExceptKillable, currentLimitResource) ``` clusterResource = usedExceptKillable = currentLimitResource = currentLimitResource: memory : 3381248/175117312 = 0.01930847362 vCores : 687/40222 = 0.01708020486 usedExceptKillable: memory : 3384320/175117312 = 0.01932601615 vCores : 688/40222 = 0.01710506687 DRF will think memory is dominated resource and return false in this scenario > Use node label reosurce as denominator to decide which resource is dominated > - > > Key: YARN-11082 > URL: https://issues.apache.org/jira/browse/YARN-11082 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.1 >Reporter: Bo Li >Priority: Major > > We ued cluster resource as denominator to decide which resoure is dominated > in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are > configed differently. > {quote}2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: > assignedContainer application > attempt=appattempt_1637412555366_1588993_01 container=null > queue=root.a.a1.a2 clusterResource= > type=RACK_LOCAL requestedPartition=xx > 2021-12-09 10:24:37,069 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: > Used resource= exceeded maxResourceLimit of the > queue = > 2021-12-09 10:24:37,069 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Failed to accept allocation proposal > {quote} > We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the > following code in AbstrctQueue#canAssignToThisQueue still return false > {quote} > Resources.greaterThanOrEqual(resourceCalculator, clusterResource, > usedExceptKillable, currentLimitResource) > {quote} > clusterResource = > usedExceptKillable = > currentLimitResource = > currentLimitResource: > memory : 3381248/175117312 = 0.01930847362 > vCores : 687/40222 = 0.01708020486 > usedExceptKillable: > memory : 3384320/175117312 = 0.01932601615 > vCores : 688/40222 = 0.01710506687 > DRF will think memory is dominated resource and
[jira] [Created] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated
Bo Li created YARN-11082: Summary: Use node label reosurce as denominator to decide which resource is dominated Key: YARN-11082 URL: https://issues.apache.org/jira/browse/YARN-11082 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Affects Versions: 3.1.1 Reporter: Bo Li We ued cluster resource as denominator to decide which resoure is dominated in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are configed differently. {quote} 2021-12-09 10:24:37,069 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1637412555366_1588993_01 container=null queue=root.a.a1.a2 clusterResource= type=RACK_LOCAL requestedPartition=xx 2021-12-09 10:24:37,069 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Used resource= exceeded maxResourceLimit of the queue = 2021-12-09 10:24:37,069 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal {quote} We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the following code in AbstrctQueue#canAssignToThisQueue still return false ```java Resources.greaterThanOrEqual(resourceCalculator, clusterResource, usedExceptKillable, currentLimitResource) ``` clusterResource = usedExceptKillable = currentLimitResource = currentLimitResource: memory : 3381248/175117312 = 0.01930847362 vCores : 687/40222 = 0.01708020486 usedExceptKillable: memory : 3384320/175117312 = 0.01932601615 vCores : 688/40222 = 0.01710506687 DRF will think memory is dominated resource and return false in this scenario -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org