[jira] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly

2022-04-06 Thread Xiping Zhang (Jira)


[ https://issues.apache.org/jira/browse/YARN-11107 ]


Xiping Zhang deleted comment on YARN-11107:
-

was (Author: zhangxiping):
cc [~BilwaST]  [~tangzhankun]

> When NodeLabel is enabled for a YARN cluster, AM blacklist program does not 
> work properly
> -
>
> Key: YARN-11107
> URL: https://issues.apache.org/jira/browse/YARN-11107
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
> Attachments: YARN-11107-branch-2.9.2.001.patch, 
> YARN-11107-branch-3.3.0.001.patch
>
>
> Yarn NodeLabel is enabled in the production environment. We encountered a 
> application AM that blacklisted all NMS corresponding to the lable in the 
> queue, and other application in the queue cannot apply for computing 
> resources. We found that RM printed a lot of logs "Trying to fulfill 
> reservation for application..."



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly

2022-04-06 Thread Xiping Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518555#comment-17518555
 ] 

Xiping Zhang commented on YARN-11107:
-

cc [~leosun08] [~linyiqun]  [~weichiu] [~hexiaoqiao] 

Could you help review this?

Thanks.

> When NodeLabel is enabled for a YARN cluster, AM blacklist program does not 
> work properly
> -
>
> Key: YARN-11107
> URL: https://issues.apache.org/jira/browse/YARN-11107
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
> Attachments: YARN-11107-branch-2.9.2.001.patch, 
> YARN-11107-branch-3.3.0.001.patch
>
>
> Yarn NodeLabel is enabled in the production environment. We encountered a 
> application AM that blacklisted all NMS corresponding to the lable in the 
> queue, and other application in the queue cannot apply for computing 
> resources. We found that RM printed a lot of logs "Trying to fulfill 
> reservation for application..."



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly

2022-04-06 Thread Xiping Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17518004#comment-17518004
 ] 

Xiping Zhang commented on YARN-11107:
-

cc [~BilwaST]  [~tangzhankun]

> When NodeLabel is enabled for a YARN cluster, AM blacklist program does not 
> work properly
> -
>
> Key: YARN-11107
> URL: https://issues.apache.org/jira/browse/YARN-11107
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
> Attachments: YARN-11107-branch-2.9.2.001.patch, 
> YARN-11107-branch-3.3.0.001.patch
>
>
> Yarn NodeLabel is enabled in the production environment. We encountered a 
> application AM that blacklisted all NMS corresponding to the lable in the 
> queue, and other application in the queue cannot apply for computing 
> resources. We found that RM printed a lot of logs "Trying to fulfill 
> reservation for application..."



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly

2022-04-06 Thread Xiping Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517921#comment-17517921
 ] 

Xiping Zhang edited comment on YARN-11107 at 4/6/22 9:56 AM:
-

i think when NodeLabel is enabled, RM should consider the lable of the 
application when passing the number of NM to AM ,When the number of blacklisted 
nodes exceeds 33% of the total number of lable nodes, the AM releases NM in the 
blacklist. for DefaultAMSProcessor.java :
{code:java}
//代码占位符

final class DefaultAMSProcessor implements ApplicationMasterServiceProcessor {
...
public void allocate(ApplicationAttemptId appAttemptId,
AllocateRequest request, AllocateResponse response) throws YarnException {
...
//Consider whether NodeLabel is enabled
response.setNumClusterNodes(getScheduler().getNumClusterNodes());
...
}




{code}


was (Author: zhangxiping):
I think when NodeLabel is enabled, RM should consider the lable of the 
application when passing the number of NM to AM ,When the number of blacklisted 
nodes exceeds 33% of the total number of lable nodes, the AM releases NM in the 
blacklist. for DefaultAMSProcessor.java :
{code:java}
//代码占位符

final class DefaultAMSProcessor implements ApplicationMasterServiceProcessor {
...
public void allocate(ApplicationAttemptId appAttemptId,
AllocateRequest request, AllocateResponse response) throws YarnException {
...
//Consider whether NodeLabel is enabled
response.setNumClusterNodes(getScheduler().getNumClusterNodes());
...
}




{code}

> When NodeLabel is enabled for a YARN cluster, AM blacklist program does not 
> work properly
> -
>
> Key: YARN-11107
> URL: https://issues.apache.org/jira/browse/YARN-11107
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
> Attachments: YARN-11107-branch-2.9.2.001.patch, 
> YARN-11107-branch-3.3.0.001.patch
>
>
> Yarn NodeLabel is enabled in the production environment. We encountered a 
> application AM that blacklisted all NMS corresponding to the lable in the 
> queue, and other application in the queue cannot apply for computing 
> resources. We found that RM printed a lot of logs "Trying to fulfill 
> reservation for application..."



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11107) When NodeLabel is enabled for a YARN cluster, AM blacklist program does not work properly

2022-04-06 Thread Xiping Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiping Zhang updated YARN-11107:

Summary: When NodeLabel is enabled for a YARN cluster, AM blacklist program 
does not work properly  (was: When NodeLabel is enabled for a YARN cluster, the 
blacklist feature is abnormal)

> When NodeLabel is enabled for a YARN cluster, AM blacklist program does not 
> work properly
> -
>
> Key: YARN-11107
> URL: https://issues.apache.org/jira/browse/YARN-11107
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
> Attachments: YARN-11107-branch-2.9.2.001.patch, 
> YARN-11107-branch-3.3.0.001.patch
>
>
> Yarn NodeLabel is enabled in the production environment. We encountered a 
> application AM that blacklisted all NMS corresponding to the lable in the 
> queue, and other application in the queue cannot apply for computing 
> resources. We found that RM printed a lot of logs "Trying to fulfill 
> reservation for application..."



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11107) When NodeLabel is enabled for a YARN cluster, the blacklist feature is abnormal

2022-04-06 Thread Xiping Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiping Zhang updated YARN-11107:

Attachment: YARN-11107-branch-3.3.0.001.patch

> When NodeLabel is enabled for a YARN cluster, the blacklist feature is 
> abnormal
> ---
>
> Key: YARN-11107
> URL: https://issues.apache.org/jira/browse/YARN-11107
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
> Attachments: YARN-11107-branch-2.9.2.001.patch, 
> YARN-11107-branch-3.3.0.001.patch
>
>
> Yarn NodeLabel is enabled in the production environment. We encountered a 
> application AM that blacklisted all NMS corresponding to the lable in the 
> queue, and other application in the queue cannot apply for computing 
> resources. We found that RM printed a lot of logs "Trying to fulfill 
> reservation for application..."



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517945#comment-17517945
 ] 

Juanjuan Tian  commented on YARN-11108:
---

[~wangda]  could you help take a look?

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-11108:
--
Affects Version/s: 2.9.2

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  reassigned YARN-11108:
-

Assignee: Juanjuan Tian 

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  edited comment on YARN-11108 at 4/6/22 8:44 AM:
---

This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after calculated, the accepted is (2GB, 3cores), assigned cpu is more than its 
pending cpu number

 

!image-2022-04-06-16-29-57-871.png!


was (Author: jutia):
This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)),idealAssigned  is used, but 

Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, it cluster resource is (32GB, 16cores),  availialble is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

 

after calculated, the accepted is (2GB, 3cores), assigned cpu is more than its 
pending cpu number

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  edited comment on YARN-11108 at 4/6/22 8:43 AM:
---

This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)),idealAssigned  is used, but 

Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, it cluster resource is (32GB, 16cores),  availialble is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

 

after calculated, the accepted is (2GB, 3cores), assigned cpu is more than its 
pending cpu number

 

!image-2022-04-06-16-29-57-871.png!


was (Author: jutia):
This issue is vcaused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)),idealAssigned  is used, but 

Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) is use 

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  commented on YARN-11108:
---

This issue is vcaused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)),idealAssigned  is used, but 

Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) is use 

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-11108:
--
Attachment: image-2022-04-06-16-29-57-871.png

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)
Juanjuan Tian  created YARN-11108:
-

 Summary: Unexpected preemptions happen when hierarchy queues case
 Key: YARN-11108
 URL: https://issues.apache.org/jira/browse/YARN-11108
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Juanjuan Tian 


Found unexpected preemptions happen when hierarchy queues case, the issue is 
that a sub queue can accept resource more than used+pending, leading to other 
queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
preemption happen unexpectedly {color}
 
2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
(ProportionalCapacityPreemptionPolicy)] 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
  NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: ]> PEN:  TOTAL_PEN:  RESERVED: 
 GAR:  NORM: 0.3424696922302246{color:#de350b} IDEAL_ASSIGNED: 
{color} IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
UNTOUCHABLE:  PREEMPTABLE: ]> BONUS_WEIGHT: 
-1.0
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11101) Fix TestYarnConfigurationFields

2022-04-06 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved YARN-11101.
--
Resolution: Duplicate

> Fix TestYarnConfigurationFields
> ---
>
> Key: YARN-11101
> URL: https://issues.apache.org/jira/browse/YARN-11101
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, newbie
>Reporter: Akira Ajisaka
>Priority: Major
>
> yarn.resourcemanager.node-labels.am.default-node-label-expression is missing 
> in yarn-default.xml.
> {noformat}
> [INFO] Running org.apache.hadoop.yarn.conf.TestYarnConfigurationFields
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.533 
> s <<< FAILURE! - in org.apache.hadoop.yarn.conf.TestYarnConfigurationFields
> [ERROR] testCompareConfigurationClassAgainstXml  Time elapsed: 0.082 s  <<< 
> FAILURE!
> java.lang.AssertionError: class org.apache.hadoop.yarn.conf.YarnConfiguration 
> has 1 variables missing in yarn-default.xml Entries:   
> yarn.resourcemanager.node-labels.am.default-node-label-expression 
> expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml(TestConfigurationFieldsBase.java:493)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11101) Fix TestYarnConfigurationFields

2022-04-06 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517924#comment-17517924
 ] 

Akira Ajisaka commented on YARN-11101:
--

Thank you [~zuston] for the information. I'll close this as duplicate.

> Fix TestYarnConfigurationFields
> ---
>
> Key: YARN-11101
> URL: https://issues.apache.org/jira/browse/YARN-11101
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, newbie
>Reporter: Akira Ajisaka
>Priority: Major
>
> yarn.resourcemanager.node-labels.am.default-node-label-expression is missing 
> in yarn-default.xml.
> {noformat}
> [INFO] Running org.apache.hadoop.yarn.conf.TestYarnConfigurationFields
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.533 
> s <<< FAILURE! - in org.apache.hadoop.yarn.conf.TestYarnConfigurationFields
> [ERROR] testCompareConfigurationClassAgainstXml  Time elapsed: 0.082 s  <<< 
> FAILURE!
> java.lang.AssertionError: class org.apache.hadoop.yarn.conf.YarnConfiguration 
> has 1 variables missing in yarn-default.xml Entries:   
> yarn.resourcemanager.node-labels.am.default-node-label-expression 
> expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml(TestConfigurationFieldsBase.java:493)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11107) When NodeLabel is enabled for a YARN cluster, the blacklist feature is abnormal

2022-04-06 Thread Xiping Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517921#comment-17517921
 ] 

Xiping Zhang commented on YARN-11107:
-

I think when NodeLabel is enabled, RM should consider the lable of the 
application when passing the number of NM to AM ,When the number of blacklisted 
nodes exceeds 33% of the total number of lable nodes, the AM releases NM in the 
blacklist. for DefaultAMSProcessor.java :
{code:java}
//代码占位符

final class DefaultAMSProcessor implements ApplicationMasterServiceProcessor {
...
public void allocate(ApplicationAttemptId appAttemptId,
AllocateRequest request, AllocateResponse response) throws YarnException {
...
//Consider whether NodeLabel is enabled
response.setNumClusterNodes(getScheduler().getNumClusterNodes());
...
}




{code}

> When NodeLabel is enabled for a YARN cluster, the blacklist feature is 
> abnormal
> ---
>
> Key: YARN-11107
> URL: https://issues.apache.org/jira/browse/YARN-11107
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
>
> Yarn NodeLabel is enabled in the production environment. We encountered a 
> application AM that blacklisted all NMS corresponding to the lable in the 
> queue, and other application in the queue cannot apply for computing 
> resources. We found that RM printed a lot of logs "Trying to fulfill 
> reservation for application..."



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11107) When NodeLabel is enabled for a YARN cluster, the blacklist feature is abnormal

2022-04-06 Thread Xiping Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiping Zhang updated YARN-11107:

Description: Yarn NodeLabel is enabled in the production environment. We 
encountered a application AM that blacklisted all NMS corresponding to the 
lable in the queue, and other application in the queue cannot apply for 
computing resources. We found that RM printed a lot of logs "Trying to fulfill 
reservation for application..."  (was: Yarn NodeLabel is enabled in the 
production environment. During application running, an AM task blacklists all 
NMs corresponding to the Lable in the queue, and other application in the queue 
cannot apply for computing resources. We found that RM printed a lot of logs 
"Trying to fulfill reservation for application...")

> When NodeLabel is enabled for a YARN cluster, the blacklist feature is 
> abnormal
> ---
>
> Key: YARN-11107
> URL: https://issues.apache.org/jira/browse/YARN-11107
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.2, 3.3.0
>Reporter: Xiping Zhang
>Priority: Major
>
> Yarn NodeLabel is enabled in the production environment. We encountered a 
> application AM that blacklisted all NMS corresponding to the lable in the 
> queue, and other application in the queue cannot apply for computing 
> resources. We found that RM printed a lot of logs "Trying to fulfill 
> reservation for application..."



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11107) When NodeLabel is enabled for a YARN cluster, the blacklist feature is abnormal

2022-04-06 Thread Xiping Zhang (Jira)
Xiping Zhang created YARN-11107:
---

 Summary: When NodeLabel is enabled for a YARN cluster, the 
blacklist feature is abnormal
 Key: YARN-11107
 URL: https://issues.apache.org/jira/browse/YARN-11107
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.3.0, 2.9.2
Reporter: Xiping Zhang


Yarn NodeLabel is enabled in the production environment. During application 
running, an AM task blacklists all NMs corresponding to the Lable in the queue, 
and other application in the queue cannot apply for computing resources. We 
found that RM printed a lot of logs "Trying to fulfill reservation for 
application..."



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11101) Fix TestYarnConfigurationFields

2022-04-06 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517854#comment-17517854
 ] 

Junfan Zhang commented on YARN-11101:
-

Sorry. This has been fixed in [https://github.com/apache/hadoop/pull/4121 
|https://github.com/apache/hadoop/pull/4121] [~aajisaka] [ 
|https://github.com/apache/hadoop/pull/4121]

> Fix TestYarnConfigurationFields
> ---
>
> Key: YARN-11101
> URL: https://issues.apache.org/jira/browse/YARN-11101
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, newbie
>Reporter: Akira Ajisaka
>Priority: Major
>
> yarn.resourcemanager.node-labels.am.default-node-label-expression is missing 
> in yarn-default.xml.
> {noformat}
> [INFO] Running org.apache.hadoop.yarn.conf.TestYarnConfigurationFields
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.533 
> s <<< FAILURE! - in org.apache.hadoop.yarn.conf.TestYarnConfigurationFields
> [ERROR] testCompareConfigurationClassAgainstXml  Time elapsed: 0.082 s  <<< 
> FAILURE!
> java.lang.AssertionError: class org.apache.hadoop.yarn.conf.YarnConfiguration 
> has 1 variables missing in yarn-default.xml Entries:   
> yarn.resourcemanager.node-labels.am.default-node-label-expression 
> expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml(TestConfigurationFieldsBase.java:493)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org