[jira] [Commented] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated

2022-02-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497894#comment-17497894
 ] 

Hadoop QA commented on YARN-11082:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red}{color} | {color:red} YARN-11082 does not apply to trunk. Rebase 
required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-11082 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13040460/YARN-11082.001.patch |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1274/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Use node label reosurce as  denominator to decide which resource is dominated
> -
>
> Key: YARN-11082
> URL: https://issues.apache.org/jira/browse/YARN-11082
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.1
>Reporter: Bo Li
>Priority: Major
> Fix For: 3.1.1
>
> Attachments: YARN-11082.001.patch
>
>
> We ued cluster resource as denominator to decide which resoure is dominated 
> in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are 
> configed differently.
> {quote}2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application 
> attempt=appattempt_1637412555366_1588993_01 container=null 
> queue=root.a.a1.a2 clusterResource= 
> type=RACK_LOCAL requestedPartition=xx
> 2021-12-09 10:24:37,069 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
>  Used resource= exceeded maxResourceLimit of the 
> queue =
> 2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal
> {quote}
> We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
> following code in AbstrctQueue#canAssignToThisQueue still return false
> {quote}
> Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
> usedExceptKillable, currentLimitResource)
> {quote}
> clusterResource = 
> usedExceptKillable =  
> currentLimitResource = 
> currentLimitResource:
> memory : 3381248/175117312 = 0.01930847362
> vCores : 687/40222 = 0.01708020486
> usedExceptKillable:
> memory : 3384320/175117312 = 0.01932601615
> vCores : 688/40222 = 0.01710506687
> DRF will think memory is dominated resource and return false in this scenario



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated

2022-02-24 Thread Bo Li (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Li updated YARN-11082:
-
Attachment: YARN-11082.001.patch

> Use node label reosurce as  denominator to decide which resource is dominated
> -
>
> Key: YARN-11082
> URL: https://issues.apache.org/jira/browse/YARN-11082
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.1
>Reporter: Bo Li
>Priority: Major
> Fix For: 3.1.1
>
> Attachments: YARN-11082.001.patch
>
>
> We ued cluster resource as denominator to decide which resoure is dominated 
> in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are 
> configed differently.
> {quote}2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application 
> attempt=appattempt_1637412555366_1588993_01 container=null 
> queue=root.a.a1.a2 clusterResource= 
> type=RACK_LOCAL requestedPartition=xx
> 2021-12-09 10:24:37,069 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
>  Used resource= exceeded maxResourceLimit of the 
> queue =
> 2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal
> {quote}
> We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
> following code in AbstrctQueue#canAssignToThisQueue still return false
> {quote}
> Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
> usedExceptKillable, currentLimitResource)
> {quote}
> clusterResource = 
> usedExceptKillable =  
> currentLimitResource = 
> currentLimitResource:
> memory : 3381248/175117312 = 0.01930847362
> vCores : 687/40222 = 0.01708020486
> usedExceptKillable:
> memory : 3384320/175117312 = 0.01932601615
> vCores : 688/40222 = 0.01710506687
> DRF will think memory is dominated resource and return false in this scenario



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated

2022-02-24 Thread Bo Li (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Li updated YARN-11082:
-
Attachment: (was: YARN-11082.patch)

> Use node label reosurce as  denominator to decide which resource is dominated
> -
>
> Key: YARN-11082
> URL: https://issues.apache.org/jira/browse/YARN-11082
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.1
>Reporter: Bo Li
>Priority: Major
> Fix For: 3.1.1
>
> Attachments: YARN-11082.001.patch
>
>
> We ued cluster resource as denominator to decide which resoure is dominated 
> in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are 
> configed differently.
> {quote}2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application 
> attempt=appattempt_1637412555366_1588993_01 container=null 
> queue=root.a.a1.a2 clusterResource= 
> type=RACK_LOCAL requestedPartition=xx
> 2021-12-09 10:24:37,069 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
>  Used resource= exceeded maxResourceLimit of the 
> queue =
> 2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal
> {quote}
> We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
> following code in AbstrctQueue#canAssignToThisQueue still return false
> {quote}
> Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
> usedExceptKillable, currentLimitResource)
> {quote}
> clusterResource = 
> usedExceptKillable =  
> currentLimitResource = 
> currentLimitResource:
> memory : 3381248/175117312 = 0.01930847362
> vCores : 687/40222 = 0.01708020486
> usedExceptKillable:
> memory : 3384320/175117312 = 0.01932601615
> vCores : 688/40222 = 0.01710506687
> DRF will think memory is dominated resource and return false in this scenario



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated

2022-02-24 Thread Bo Li (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Li updated YARN-11082:
-
Target Version/s: 3.1.1

> Use node label reosurce as  denominator to decide which resource is dominated
> -
>
> Key: YARN-11082
> URL: https://issues.apache.org/jira/browse/YARN-11082
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.1
>Reporter: Bo Li
>Priority: Major
> Fix For: 3.1.1
>
> Attachments: YARN-11082.patch
>
>
> We ued cluster resource as denominator to decide which resoure is dominated 
> in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are 
> configed differently.
> {quote}2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application 
> attempt=appattempt_1637412555366_1588993_01 container=null 
> queue=root.a.a1.a2 clusterResource= 
> type=RACK_LOCAL requestedPartition=xx
> 2021-12-09 10:24:37,069 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
>  Used resource= exceeded maxResourceLimit of the 
> queue =
> 2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal
> {quote}
> We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
> following code in AbstrctQueue#canAssignToThisQueue still return false
> {quote}
> Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
> usedExceptKillable, currentLimitResource)
> {quote}
> clusterResource = 
> usedExceptKillable =  
> currentLimitResource = 
> currentLimitResource:
> memory : 3381248/175117312 = 0.01930847362
> vCores : 687/40222 = 0.01708020486
> usedExceptKillable:
> memory : 3384320/175117312 = 0.01932601615
> vCores : 688/40222 = 0.01710506687
> DRF will think memory is dominated resource and return false in this scenario



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11068) Update transitive log4j2 dependency to 2.17.1

2022-02-24 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497888#comment-17497888
 ] 

Wei-Chiu Chuang commented on YARN-11068:


No ... the solr dependency and its log4j2 transitive dependency was introduced 
in Hadoop 3.3 by the YARN application catalog feature. It's not applicable to 
branch-3.2

> Update transitive log4j2 dependency to 2.17.1
> -
>
> Key: YARN-11068
> URL: https://issues.apache.org/jira/browse/YARN-11068
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Similar to HADOOP-18092, we have transitive log4j2 dependency coming from 
> solr-core 8 that must be excluded.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11068) Update transitive log4j2 dependency to 2.17.1

2022-02-24 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497473#comment-17497473
 ] 

Brahma Reddy Battula commented on YARN-11068:
-

[~weichiu] and [~aajisaka]  can you merge to branch-3.2.3..?

> Update transitive log4j2 dependency to 2.17.1
> -
>
> Key: YARN-11068
> URL: https://issues.apache.org/jira/browse/YARN-11068
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Similar to HADOOP-18092, we have transitive log4j2 dependency coming from 
> solr-core 8 that must be excluded.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10701) The yarn.resource-types should support multi types without trimmed.

2022-02-24 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10701:
--
Description: 
{code:java}

 
 yarn.resource-types
 yarn.io/gpu, yarn.io/fpga
 
 {code}

 When i configured the resource type above with gpu and fpga, the error 
happened:

 
{code:java}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is not 
a valid resource name. A valid resource name must begin with a letter and 
contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
name may also be optionally preceded by a name space followed by a slash. A 
valid name space consists of period-separated groups of letters, numbers, and 
dashes.{code}

  
 The resource types should support trim.

  was:
{code:java}

 
 yarn.resource-types
 yarn.io/gpu, yarn.io/fpga
 
 {code}

 When i configured the resource type above with gpu and fpga, the error happend:

 
{code:java}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is not 
a valid resource name. A valid resource name must begin with a letter and 
contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
name may also be optionally preceded by a name space followed by a slash. A 
valid name space consists of period-separated groups of letters, numbers, and 
dashes.{code}

  
 The resource types should support trim.


> The yarn.resource-types should support multi types without trimmed.
> ---
>
> Key: YARN-10701
> URL: https://issues.apache.org/jira/browse/YARN-10701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10701-branch-3.3.001.patch, YARN-10701.001.patch, 
> YARN-10701.002.patch
>
>
> {code:java}
> 
>  
>  yarn.resource-types
>  yarn.io/gpu, yarn.io/fpga
>  
>  {code}
>  When i configured the resource type above with gpu and fpga, the error 
> happened:
>  
> {code:java}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is 
> not a valid resource name. A valid resource name must begin with a letter and 
> contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
> name may also be optionally preceded by a name space followed by a slash. A 
> valid name space consists of period-separated groups of letters, numbers, and 
> dashes.{code}
>   
>  The resource types should support trim.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10438) Handle null containerId in ClientRMService#getContainerReport()

2022-02-24 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10438:
--
Description: 
Here is the Exception trace which we are seeing, we are suspecting because of 
this exception RM is reaching in a state where it is no more allowing any new 
job to run on the cluster.


{code:java}
2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default port 
8032, call Call#1463486 Retry#0 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport from 
10.39.91.205:49564 java.lang.NullPointerException at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
 at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
{code}


We are seeing this issue with this specific node only, we do run this cluster 
at a scale of around 500 nodes. 

  was:
Here is the Exception trace which we are seeing, we are suspecting because of 
this exception RM is reaching in a state where it is no more allowing any new 
job to run on the cluster.

{noformat}
2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default port 
8032, call Call#1463486 Retry#0 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport from 
10.39.91.205:49564 java.lang.NullPointerException at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520)
 at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
 at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at 
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
{noformat}

We are seeing this issue with this specific node only, we do run this cluster 
at a scale of around 500 nodes. 


> Handle null containerId in ClientRMService#getContainerReport()
> ---
>
> Key: YARN-10438
> URL: https://issues.apache.org/jira/browse/YARN-10438
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Raghvendra Singh
>Assignee: Shubham Gupta
>Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2
>
>
> Here is the Exception trace which we are seeing, we are suspecting because of 
> this exception RM is reaching in a state where it is no more allowing any new 
> job to run on the cluster.
> {code:java}
> 2020-09-15 07:08:15,496 WARN ipc.Server: IPC Server handler 18 on default 
> port 8032, call Call#1463486 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getContainerReport 
> from 10.39.91.205:49564 java.lang.NullPointerException at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:520)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) at 
> 

[jira] [Commented] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated

2022-02-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497262#comment-17497262
 ] 

Hadoop QA commented on YARN-11082:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red}{color} | {color:red} YARN-11082 does not apply to trunk. Rebase 
required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-11082 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13040419/YARN-11082.patch |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1273/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Use node label reosurce as  denominator to decide which resource is dominated
> -
>
> Key: YARN-11082
> URL: https://issues.apache.org/jira/browse/YARN-11082
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.1
>Reporter: Bo Li
>Priority: Major
> Fix For: 3.1.1
>
> Attachments: YARN-11082.patch
>
>
> We ued cluster resource as denominator to decide which resoure is dominated 
> in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are 
> configed differently.
> {quote}2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application 
> attempt=appattempt_1637412555366_1588993_01 container=null 
> queue=root.a.a1.a2 clusterResource= 
> type=RACK_LOCAL requestedPartition=xx
> 2021-12-09 10:24:37,069 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
>  Used resource= exceeded maxResourceLimit of the 
> queue =
> 2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal
> {quote}
> We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
> following code in AbstrctQueue#canAssignToThisQueue still return false
> {quote}
> Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
> usedExceptKillable, currentLimitResource)
> {quote}
> clusterResource = 
> usedExceptKillable =  
> currentLimitResource = 
> currentLimitResource:
> memory : 3381248/175117312 = 0.01930847362
> vCores : 687/40222 = 0.01708020486
> usedExceptKillable:
> memory : 3384320/175117312 = 0.01932601615
> vCores : 688/40222 = 0.01710506687
> DRF will think memory is dominated resource and return false in this scenario



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated

2022-02-24 Thread Bo Li (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Li updated YARN-11082:
-
Description: 
We ued cluster resource as denominator to decide which resoure is dominated in 
AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are configed 
differently.
{quote}2021-12-09 10:24:37,069 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1637412555366_1588993_01 
container=null queue=root.a.a1.a2 clusterResource= type=RACK_LOCAL requestedPartition=xx
2021-12-09 10:24:37,069 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
 Used resource= exceeded maxResourceLimit of the 
queue =

2021-12-09 10:24:37,069 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal
{quote}
We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
following code in AbstrctQueue#canAssignToThisQueue still return false

{quote}
Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
usedExceptKillable, currentLimitResource)
{quote}

clusterResource = 
usedExceptKillable =  
currentLimitResource = 

currentLimitResource:
memory : 3381248/175117312 = 0.01930847362
vCores : 687/40222 = 0.01708020486

usedExceptKillable:
memory : 3384320/175117312 = 0.01932601615
vCores : 688/40222 = 0.01710506687

DRF will think memory is dominated resource and return false in this scenario

  was:
We ued cluster resource as denominator to decide which resoure is dominated in 
AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are configed 
differently.
{quote}
2021-12-09 10:24:37,069 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1637412555366_1588993_01 
container=null queue=root.a.a1.a2 clusterResource= type=RACK_LOCAL requestedPartition=xx
2021-12-09 10:24:37,069 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
 Used resource= exceeded maxResourceLimit of the 
queue =

2021-12-09 10:24:37,069 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal

{quote}

We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
following code in AbstrctQueue#canAssignToThisQueue still return false
```java
Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
  usedExceptKillable, currentLimitResource)
```
 clusterResource = 
usedExceptKillable =  
currentLimitResource = 

currentLimitResource:
memory : 3381248/175117312 = 0.01930847362
vCores : 687/40222 = 0.01708020486

usedExceptKillable:
memory : 3384320/175117312 = 0.01932601615
vCores : 688/40222 = 0.01710506687

DRF will think memory is dominated resource and return false in this scenario


> Use node label reosurce as  denominator to decide which resource is dominated
> -
>
> Key: YARN-11082
> URL: https://issues.apache.org/jira/browse/YARN-11082
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.1
>Reporter: Bo Li
>Priority: Major
>
> We ued cluster resource as denominator to decide which resoure is dominated 
> in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are 
> configed differently.
> {quote}2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application 
> attempt=appattempt_1637412555366_1588993_01 container=null 
> queue=root.a.a1.a2 clusterResource= 
> type=RACK_LOCAL requestedPartition=xx
> 2021-12-09 10:24:37,069 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
>  Used resource= exceeded maxResourceLimit of the 
> queue =
> 2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal
> {quote}
> We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
> following code in AbstrctQueue#canAssignToThisQueue still return false
> {quote}
> Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
> usedExceptKillable, currentLimitResource)
> {quote}
> clusterResource = 
> usedExceptKillable =  
> currentLimitResource = 
> currentLimitResource:
> memory : 3381248/175117312 = 0.01930847362
> vCores : 687/40222 = 0.01708020486
> usedExceptKillable:
> memory : 3384320/175117312 = 0.01932601615
> vCores : 688/40222 = 0.01710506687
> DRF will think memory is dominated resource and 

[jira] [Created] (YARN-11082) Use node label reosurce as denominator to decide which resource is dominated

2022-02-24 Thread Bo Li (Jira)
Bo Li created YARN-11082:


 Summary: Use node label reosurce as  denominator to decide which 
resource is dominated
 Key: YARN-11082
 URL: https://issues.apache.org/jira/browse/YARN-11082
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 3.1.1
Reporter: Bo Li


We ued cluster resource as denominator to decide which resoure is dominated in 
AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are configed 
differently.
{quote}
2021-12-09 10:24:37,069 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1637412555366_1588993_01 
container=null queue=root.a.a1.a2 clusterResource= type=RACK_LOCAL requestedPartition=xx
2021-12-09 10:24:37,069 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
 Used resource= exceeded maxResourceLimit of the 
queue =

2021-12-09 10:24:37,069 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal

{quote}

We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
following code in AbstrctQueue#canAssignToThisQueue still return false
```java
Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
  usedExceptKillable, currentLimitResource)
```
 clusterResource = 
usedExceptKillable =  
currentLimitResource = 

currentLimitResource:
memory : 3381248/175117312 = 0.01930847362
vCores : 687/40222 = 0.01708020486

usedExceptKillable:
memory : 3384320/175117312 = 0.01932601615
vCores : 688/40222 = 0.01710506687

DRF will think memory is dominated resource and return false in this scenario



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org