[jira] [Updated] (YARN-9936) Support vector of capacity percentages in Capacity Scheduler configuration

2021-03-22 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9936:
-
Description: 
Currently, the Capacity Scheduler queue configuration supports two ways to set 
queue capacity.
 * In percentage of all available resources as a float ( eg. 25.0 ) means 25% 
of the resources of its parent queue for all resource types equally (eg. 25% of 
all memory, 25% of all CPU cores, and 25% of all available GPU in the cluster) 
The percentages of all queues has to add up to 100%.
 * In an absolute amount of resources ( e.g. memory=4GB,vcores=20,yarn.io/gpu=4 
). The amount of all resources in the queues has to be less than or equal to 
all resources in the cluster.{color:#de350b}Actually, the above is not 
supported, we only support memory and vcores now in absolute mode, we should 
extend in {color}YARN-10503.

Apart from these two already existing ways, there is a demand to add capacity 
percentage of each available resource type separately. (eg. 
{{memory=20%,vcores=40%,yarn.io/gpu=100%}}).
 At the same time, a similar concept should be included with queues 
maximum-capacity as well.

  was:
Currently, the Capacity Scheduler queue configuration supports two ways to set 
queue capacity.
 * In percentage of all available resources as a float ( eg. 25.0 ) means 25% 
of the resources of its parent queue for all resource types equally (eg. 25% of 
all memory, 25% of all CPU cores, and 25% of all available GPU in the cluster) 
The percentages of all queues has to add up to 100%.
 * In an absolute amount of resources ( e.g. memory=4GB,vcores=20,yarn.io/gpu=4 
). The amount of all resources in the queues has to be less than or equal to 
all resources in the cluster.

{color:#de350b}Actually, the above is not supported, we only support memory and 
vcores now in absolute mode, we should extend in {color}YARN-10503.

Apart from these two already existing ways, there is a demand to add capacity 
percentage of each available resource type separately. (eg. 
{{memory=20%,vcores=40%,yarn.io/gpu=100%}}).
 At the same time, a similar concept should be included with queues 
maximum-capacity as well.


> Support vector of capacity percentages in Capacity Scheduler configuration
> --
>
> Key: YARN-9936
> URL: https://issues.apache.org/jira/browse/YARN-9936
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Zoltan Siegl
>Assignee: Andras Gyori
>Priority: Major
> Attachments: Capacity Scheduler support of “vector of resources 
> percentage”.pdf
>
>
> Currently, the Capacity Scheduler queue configuration supports two ways to 
> set queue capacity.
>  * In percentage of all available resources as a float ( eg. 25.0 ) means 25% 
> of the resources of its parent queue for all resource types equally (eg. 25% 
> of all memory, 25% of all CPU cores, and 25% of all available GPU in the 
> cluster) The percentages of all queues has to add up to 100%.
>  * In an absolute amount of resources ( e.g. 
> memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the 
> queues has to be less than or equal to all resources in the 
> cluster.{color:#de350b}Actually, the above is not supported, we only support 
> memory and vcores now in absolute mode, we should extend in {color}YARN-10503.
> Apart from these two already existing ways, there is a demand to add capacity 
> percentage of each available resource type separately. (eg. 
> {{memory=20%,vcores=40%,yarn.io/gpu=100%}}).
>  At the same time, a similar concept should be included with queues 
> maximum-capacity as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9936) Support vector of capacity percentages in Capacity Scheduler configuration

2021-03-22 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9936:
-
Description: 
Currently, the Capacity Scheduler queue configuration supports two ways to set 
queue capacity.
 * In percentage of all available resources as a float ( eg. 25.0 ) means 25% 
of the resources of its parent queue for all resource types equally (eg. 25% of 
all memory, 25% of all CPU cores, and 25% of all available GPU in the cluster) 
The percentages of all queues has to add up to 100%.
 * In an absolute amount of resources ( e.g. memory=4GB,vcores=20,yarn.io/gpu=4 
). The amount of all resources in the queues has to be less than or equal to 
all resources in the cluster.

{color:#de350b}Actually, the above is not supported, we only support memory and 
vcores now in absolute mode, we should extend in {color}YARN-10503.

Apart from these two already existing ways, there is a demand to add capacity 
percentage of each available resource type separately. (eg. 
{{memory=20%,vcores=40%,yarn.io/gpu=100%}}).
 At the same time, a similar concept should be included with queues 
maximum-capacity as well.

  was:
Currently, the Capacity Scheduler queue configuration supports two ways to set 
queue capacity.
 * In percentage of all available resources as a float ( eg. 25.0 ) means 25% 
of the resources of its parent queue for all resource types equally (eg. 25% of 
all memory, 25% of all CPU cores, and 25% of all available GPU in the cluster) 
The percentages of all queues has to add up to 100%.
 * In an absolute amount of resources ( e.g. memory=4GB,vcores=20,yarn.io/gpu=4 
). The amount of all resources in the queues has to be less than or equal to 
all resources in the cluster.

Apart from these two already existing ways, there is a demand to add capacity 
percentage of each available resource type separately. (eg. 
{{memory=20%,vcores=40%,yarn.io/gpu=100%}}).
 At the same time, a similar concept should be included with queues 
maximum-capacity as well.


> Support vector of capacity percentages in Capacity Scheduler configuration
> --
>
> Key: YARN-9936
> URL: https://issues.apache.org/jira/browse/YARN-9936
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Zoltan Siegl
>Assignee: Andras Gyori
>Priority: Major
> Attachments: Capacity Scheduler support of “vector of resources 
> percentage”.pdf
>
>
> Currently, the Capacity Scheduler queue configuration supports two ways to 
> set queue capacity.
>  * In percentage of all available resources as a float ( eg. 25.0 ) means 25% 
> of the resources of its parent queue for all resource types equally (eg. 25% 
> of all memory, 25% of all CPU cores, and 25% of all available GPU in the 
> cluster) The percentages of all queues has to add up to 100%.
>  * In an absolute amount of resources ( e.g. 
> memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the 
> queues has to be less than or equal to all resources in the cluster.
> {color:#de350b}Actually, the above is not supported, we only support memory 
> and vcores now in absolute mode, we should extend in {color}YARN-10503.
> Apart from these two already existing ways, there is a demand to add capacity 
> percentage of each available resource type separately. (eg. 
> {{memory=20%,vcores=40%,yarn.io/gpu=100%}}).
>  At the same time, a similar concept should be included with queues 
> maximum-capacity as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-03-22 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10503:
--
Summary: Support queue capacity in terms of absolute resources with custom 
resourceType.  (was: Support queue capacity in terms of absolute resources with 
gpu resourceType.)

> Support queue capacity in terms of absolute resources with custom 
> resourceType.
> ---
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10708) Remove NULL check before instanceof

2021-03-22 Thread Jiajun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiajun Jiang updated YARN-10708:

Attachment: YARN-10708.patch

> Remove NULL check before instanceof
> ---
>
> Key: YARN-10708
> URL: https://issues.apache.org/jira/browse/YARN-10708
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jiajun Jiang
>Priority: Minor
> Attachments: YARN-10708.patch
>
>
> Submitted patch to remove the NULL check before instanceof check in several 
> classes. Same issue with YARN-9340.
> Classes involved.
> *  M  
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceProfilesResponse.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceTypeInfoResponse.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileRequest.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileResponse.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/volume/csi/VolumeId.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/AssignedDevice.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntimeContext.java
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10710) [Clean-up] Remove NULL check before instanceof

2021-03-22 Thread Jiajun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiajun Jiang resolved YARN-10710.
-
Resolution: Duplicate

> [Clean-up] Remove NULL check before instanceof 
> ---
>
> Key: YARN-10710
> URL: https://issues.apache.org/jira/browse/YARN-10710
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jiajun Jiang
>Priority: Minor
>
> NULL check before instanceof check should be removed. Same issue refer to 
> YARN-9340.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10709) Remove NULL check before instanceof

2021-03-22 Thread Jiajun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiajun Jiang resolved YARN-10709.
-
Resolution: Duplicate

> Remove NULL check before instanceof
> ---
>
> Key: YARN-10709
> URL: https://issues.apache.org/jira/browse/YARN-10709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jiajun Jiang
>Priority: Minor
>
> Submitted patch to remove the NULL check before instanceof check in several 
> classes. Same issue with YARN-9340.
> Classes involved.
> *  M  
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceProfilesResponse.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceTypeInfoResponse.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileRequest.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileResponse.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/volume/csi/VolumeId.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/AssignedDevice.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java
> * M   
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntimeContext.java
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10710) [Clean-up] Remove NULL check before instanceof

2021-03-22 Thread Jiajun Jiang (Jira)
Jiajun Jiang created YARN-10710:
---

 Summary: [Clean-up] Remove NULL check before instanceof 
 Key: YARN-10710
 URL: https://issues.apache.org/jira/browse/YARN-10710
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jiajun Jiang


NULL check before instanceof check should be removed. Same issue refer to 
YARN-9340.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10709) Remove NULL check before instanceof

2021-03-22 Thread Jiajun Jiang (Jira)
Jiajun Jiang created YARN-10709:
---

 Summary: Remove NULL check before instanceof
 Key: YARN-10709
 URL: https://issues.apache.org/jira/browse/YARN-10709
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jiajun Jiang


Submitted patch to remove the NULL check before instanceof check in several 
classes. Same issue with YARN-9340.

Classes involved.

*  M
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceProfilesResponse.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceTypeInfoResponse.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileRequest.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileResponse.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/volume/csi/VolumeId.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/AssignedDevice.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntimeContext.java


 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10708) Remove NULL check before instanceof

2021-03-22 Thread Jiajun Jiang (Jira)
Jiajun Jiang created YARN-10708:
---

 Summary: Remove NULL check before instanceof
 Key: YARN-10708
 URL: https://issues.apache.org/jira/browse/YARN-10708
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jiajun Jiang


Submitted patch to remove the NULL check before instanceof check in several 
classes. Same issue with YARN-9340.

Classes involved.

*  M
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceProfilesResponse.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetAllResourceTypeInfoResponse.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileRequest.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetResourceProfileResponse.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/LightWeightResource.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/volume/csi/VolumeId.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/deviceframework/AssignedDevice.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/AssignedGpuDevice.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/gpu/GpuDevice.java
* M 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntimeContext.java


 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

2021-03-22 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306489#comment-17306489
 ] 

Jim Brennan edited comment on YARN-10697 at 3/22/21, 7:02 PM:
--

Thanks for the update [~BilwaST]!
(edited) patch 002 looks mostly good, but can you please rename getResources()? 
 There is already a public Resource.getResources(), and the two functions are 
completely different.
Maybe the private one should be called getFormattedString()?   The new public 
one could also be getFormattedString().




was (Author: jim_brennan):
Thanks for the update [~BilwaST]!  +1 patch 002 looks good to me.


> Resources are displayed in bytes in UI for schedulers other than capacity
> -
>
> Key: YARN-10697
> URL: https://issues.apache.org/jira/browse/YARN-10697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10697.001.patch, YARN-10697.002.patch, 
> image-2021-03-17-11-30-57-216.png
>
>
> Resources.newInstance expects MB as memory whereas in MetricsOverviewTable 
> passes resources in bytes . Also we should display memory in GB for better 
> readability for user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

2021-03-22 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306489#comment-17306489
 ] 

Jim Brennan commented on YARN-10697:


Thanks for the update [~BilwaST]!  +1 patch 002 looks good to me.


> Resources are displayed in bytes in UI for schedulers other than capacity
> -
>
> Key: YARN-10697
> URL: https://issues.apache.org/jira/browse/YARN-10697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10697.001.patch, YARN-10697.002.patch, 
> image-2021-03-17-11-30-57-216.png
>
>
> Resources.newInstance expects MB as memory whereas in MetricsOverviewTable 
> passes resources in bytes . Also we should display memory in GB for better 
> readability for user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

2021-03-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306292#comment-17306292
 ] 

Hadoop QA commented on YARN-10704:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
24s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
22s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 16s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
23s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/835/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 82 unchanged - 0 fixed = 83 total (was 82) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  4s{color} | 

[jira] [Comment Edited] (YARN-10517) QueueMetrics has incorrect Allocated Resource when labelled partitions updated

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305469#comment-17305469
 ] 

Qi Zhu edited comment on YARN-10517 at 3/22/21, 2:50 PM:
-

I meet the problem too.

I fixed it and add corresponding test in  YARN-10517.001.patch.

 [~epayne] [~pbacsko] [~gandras]  [~ebadger]  [~jianliang.wu] Could you help 
review this?

Thanks.:D


was (Author: zhuqi):
I meet the problem too.

I fixed it and add corresponding test in  YARN-10517.001.patch.

 [~epayne] [~pbacsko] [~gandras]  [~ebadger]  Could you help review this?

Thanks.:D

> QueueMetrics has incorrect Allocated Resource when labelled partitions updated
> --
>
> Key: YARN-10517
> URL: https://issues.apache.org/jira/browse/YARN-10517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0, 3.3.0
>Reporter: sibyl.lv
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, 
> wrong metrics.png
>
>
> After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has 
> incorrect allocated jmx, such as  {color:#660e7a}allocatedMB, 
> {color}{color:#660e7a}allocatedVCores and 
> {color}{color:#660e7a}allocatedContainers, {color}when the node partition is 
> updated from "DEFAULT" to other label and there are  running applications.
> Steps to reproduce
> ==
>  # Configure capacity-scheduler.xml with label configuration
>  # Submit one application to default partition and run
>  # Add label "tpcds" to cluster and replace label on node1 and node2 to be 
> "tpcds" when the above application is running
>  # Note down "VCores Used" at Web UI
>  # When the application is finished, the metrics get wrong (screenshots 
> attached).
> ==
>  
> FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles 
> this event {color:#660e7a}NODE_LABELS_UPDATE.{color}
> So we should release container resource from old partition and add used 
> resource to new partition, just as updating queueUsage.
> {code:java}
> // code placeholder
> public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition,
> String newPartition) {
>   Resource containerResource = rmContainer.getAllocatedResource();
>   this.attemptResourceUsage.decUsed(oldPartition, containerResource);
>   this.attemptResourceUsage.incUsed(newPartition, containerResource);
>   getCSLeafQueue().decUsedResource(oldPartition, containerResource, this);
>   getCSLeafQueue().incUsedResource(newPartition, containerResource, this);
>   // Update new partition name if container is AM and also update AM resource
>   if (rmContainer.isAMContainer()) {
> setAppAMNodePartitionName(newPartition);
> this.attemptResourceUsage.decAMUsed(oldPartition, containerResource);
> this.attemptResourceUsage.incAMUsed(newPartition, containerResource);
> getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this);
> getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306240#comment-17306240
 ] 

Peter Bacsko commented on YARN-10674:
-

[~zhuqi] I had a discussion with [~gandras], he will post an update soon.

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch, YARN-10674.016.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306215#comment-17306215
 ] 

Qi Zhu commented on YARN-10674:
---

[~pbacsko]

If any advice about this?

Thanks.:D

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch, YARN-10674.016.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9936) Support vector of capacity percentages in Capacity Scheduler configuration

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306213#comment-17306213
 ] 

Qi Zhu commented on YARN-9936:
--

Thanks [~gandras] for taking.

It's an important feature to CS. 

 

> Support vector of capacity percentages in Capacity Scheduler configuration
> --
>
> Key: YARN-9936
> URL: https://issues.apache.org/jira/browse/YARN-9936
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Zoltan Siegl
>Assignee: Andras Gyori
>Priority: Major
> Attachments: Capacity Scheduler support of “vector of resources 
> percentage”.pdf
>
>
> Currently, the Capacity Scheduler queue configuration supports two ways to 
> set queue capacity.
>  * In percentage of all available resources as a float ( eg. 25.0 ) means 25% 
> of the resources of its parent queue for all resource types equally (eg. 25% 
> of all memory, 25% of all CPU cores, and 25% of all available GPU in the 
> cluster) The percentages of all queues has to add up to 100%.
>  * In an absolute amount of resources ( e.g. 
> memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the 
> queues has to be less than or equal to all resources in the cluster.
> Apart from these two already existing ways, there is a demand to add capacity 
> percentage of each available resource type separately. (eg. 
> {{memory=20%,vcores=40%,yarn.io/gpu=100%}}).
>  At the same time, a similar concept should be included with queues 
> maximum-capacity as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9936) Support vector of capacity percentages in Capacity Scheduler configuration

2021-03-22 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori reassigned YARN-9936:
--

Assignee: Andras Gyori  (was: Zoltan Siegl)

> Support vector of capacity percentages in Capacity Scheduler configuration
> --
>
> Key: YARN-9936
> URL: https://issues.apache.org/jira/browse/YARN-9936
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Zoltan Siegl
>Assignee: Andras Gyori
>Priority: Major
> Attachments: Capacity Scheduler support of “vector of resources 
> percentage”.pdf
>
>
> Currently, the Capacity Scheduler queue configuration supports two ways to 
> set queue capacity.
>  * In percentage of all available resources as a float ( eg. 25.0 ) means 25% 
> of the resources of its parent queue for all resource types equally (eg. 25% 
> of all memory, 25% of all CPU cores, and 25% of all available GPU in the 
> cluster) The percentages of all queues has to add up to 100%.
>  * In an absolute amount of resources ( e.g. 
> memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the 
> queues has to be less than or equal to all resources in the cluster.
> Apart from these two already existing ways, there is a demand to add capacity 
> percentage of each available resource type separately. (eg. 
> {{memory=20%,vcores=40%,yarn.io/gpu=100%}}).
>  At the same time, a similar concept should be included with queues 
> maximum-capacity as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9936) Support vector of capacity percentages in Capacity Scheduler configuration

2021-03-22 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306212#comment-17306212
 ] 

Andras Gyori commented on YARN-9936:


I can see that it is not actively developed, so I take it over. Please feel 
free to retake it, if you want to start working on it.

> Support vector of capacity percentages in Capacity Scheduler configuration
> --
>
> Key: YARN-9936
> URL: https://issues.apache.org/jira/browse/YARN-9936
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: Capacity Scheduler support of “vector of resources 
> percentage”.pdf
>
>
> Currently, the Capacity Scheduler queue configuration supports two ways to 
> set queue capacity.
>  * In percentage of all available resources as a float ( eg. 25.0 ) means 25% 
> of the resources of its parent queue for all resource types equally (eg. 25% 
> of all memory, 25% of all CPU cores, and 25% of all available GPU in the 
> cluster) The percentages of all queues has to add up to 100%.
>  * In an absolute amount of resources ( e.g. 
> memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the 
> queues has to be less than or equal to all resources in the cluster.
> Apart from these two already existing ways, there is a demand to add capacity 
> percentage of each available resource type separately. (eg. 
> {{memory=20%,vcores=40%,yarn.io/gpu=100%}}).
>  At the same time, a similar concept should be included with queues 
> maximum-capacity as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10645) Fix queue state related update for auto created queue.

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306207#comment-17306207
 ] 

Qi Zhu commented on YARN-10645:
---

Thanks [~pbacsko] for reply.

It is included in YARN-10564 now, and i added the related test in YARN-10564 to 
confirm it already.

This can be closed now.

Thanks.

> Fix queue state related update for auto created queue.
> --
>
> Key: YARN-10645
> URL: https://issues.apache.org/jira/browse/YARN-10645
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10645.001.patch
>
>
> Now the queue state in auto created queue can't be updated after refactor in 
> YARN-10504.
> We should support fix the queue state related logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10645) Fix queue state related update for auto created queue.

2021-03-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306203#comment-17306203
 ] 

Peter Bacsko commented on YARN-10645:
-

[~zhuqi] [~gandras] is this patch still needed? Looking at Andras' comment, it 
is telling me that this ticket is a duplicate. Is it a dup? 

> Fix queue state related update for auto created queue.
> --
>
> Key: YARN-10645
> URL: https://issues.apache.org/jira/browse/YARN-10645
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10645.001.patch
>
>
> Now the queue state in auto created queue can't be updated after refactor in 
> YARN-10504.
> We should support fix the queue state related logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.

2021-03-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306183#comment-17306183
 ] 

Hadoop QA commented on YARN-10503:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
28s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
40s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  2s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
11s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
51s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/834/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 9 new + 54 unchanged - 0 fixed = 63 total (was 54) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} The patch has no ill-formed 
XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | 

[jira] [Commented] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups

2021-03-22 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306175#comment-17306175
 ] 

Gergely Pollak commented on YARN-10597:
---

[~ahussein] thank you for the feedback, I'll take a look into the config 
argument passing, and double check it if we need to add it or not.

[~pbacsko] I think during other patches we might have fixed the failing unit 
tests, when last time I checked it, there were some failures, but since then we 
made changes to the tests, and I think we added a test helper method to set the 
Groups externally. However I wasn't sure it would cover all the failing tests, 
it was a surprise to me as well. I expected some failures. 

Anyway I look into [~ahussein]'s suggestion, and get back with the results.

> CSMappingPlacementRule should not create new instance of Groups
> ---
>
> Key: YARN-10597
> URL: https://issues.apache.org/jira/browse/YARN-10597
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10597.001.patch
>
>
> As [~ahussein] pointed out in YARN-10425, no new Groups instance should be 
> created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306170#comment-17306170
 ] 

Qi Zhu edited comment on YARN-10503 at 3/22/21, 12:47 PM:
--

Thanks [~pbacsko] for review.

I add this to support more string types to support gpus and fpgas:
{code:java}
public enum AbsoluteResourceType {
MEMORY, VCORES, GPUS, FPGAS
}
{code}
Actually, we can also use the original case in the conf: 
{code:java}
public static final String MEMORY_URI = "memory-mb";
public static final String VCORES_URI = "vcores";
public static final String GPU_URI = "yarn.io/gpu";
public static final String FPGA_URI = "yarn.io/fpga";
{code}
such as :
{code:java}
1. 
2. {code}
It will also take effect both two situation in latest patch, the corresponding 
logic:
{code:java}
2. 
// Custom resource type defined by user.
if (!resourceTypes.contains(splits[0])) {
  resource.setResourceInformation(splits[0].trim(), ResourceInformation
  .newInstance(splits[0].trim(), units, resourceValue));
  return;
}

1. 

// map it based on key.
AbsoluteResourceType resType = AbsoluteResourceType
.valueOf(StringUtils.toUpperCase(splits[0].trim()));
switch (resType) {
case MEMORY :
  resource.setMemorySize(resourceValue);
  break;
case VCORES :
  resource.setVirtualCores(resourceValue.intValue());
  break;
case GPUS :
  Integer gpuIndex = ResourceUtils.getResourceTypeIndex()
  .get(ResourceInformation.GPU_URI);
  if (gpuIndex != null) {
resource.setResourceValue(ResourceInformation.GPU_URI,
resourceValue.intValue());
  } else {
LOG.error("GPU is not supported in conf.");
  }
  break;
case FPGAS :
  Integer fpgaIndex = ResourceUtils.getResourceTypeIndex()
  .get(ResourceInformation.FPGA_URI);
  if (fpgaIndex != null) {
resource.setResourceValue(ResourceInformation.FPGA_URI,
resourceValue.intValue());
  } else {
LOG.error("FPGA is not supported in conf.");
  }
  break;
default :
  resource.setResourceInformation(splits[0].trim(), ResourceInformation
  .newInstance(splits[0].trim(), units, resourceValue));
  break;
}
{code}
We can also change code to:
{code:java}
public enum AbsoluteResourceType { MEMORY, VCORES }
{code}
And restrict to 
{code:java}
public static final String GPU_URI = "yarn.io/gpu";
public static final String FPGA_URI = "yarn.io/fpga";
{code}
What you opinion about this?

Thanks.

 


was (Author: zhuqi):
Thanks [~pbacsko] for review.

I add this to support more string types to support gpus and fpgas:
{code:java}
public enum AbsoluteResourceType {
MEMORY, VCORES, GPUS, FPGAS
}
{code}
Actually, we can also use the original case in the conf: 
{code:java}
public static final String MEMORY_URI = "memory-mb";
public static final String VCORES_URI = "vcores";
public static final String GPU_URI = "yarn.io/gpu";
public static final String FPGA_URI = "yarn.io/fpga";
{code}
such as :
{code:java}
1. 
2. {code}
It will also take effect both two situation in latest patch, the corresponding 
logic:
{code:java}
2. 
// Custom resource type defined by user.
if (!resourceTypes.contains(splits[0])) {
  resource.setResourceInformation(splits[0].trim(), ResourceInformation
  .newInstance(splits[0].trim(), units, resourceValue));
  return;
}

1. 

// map it based on key.
AbsoluteResourceType resType = AbsoluteResourceType
.valueOf(StringUtils.toUpperCase(splits[0].trim()));
switch (resType) {
case MEMORY :
  resource.setMemorySize(resourceValue);
  break;
case VCORES :
  resource.setVirtualCores(resourceValue.intValue());
  break;
case GPUS :
  Integer gpuIndex = ResourceUtils.getResourceTypeIndex()
  .get(ResourceInformation.GPU_URI);
  if (gpuIndex != null) {
resource.setResourceValue(ResourceInformation.GPU_URI,
resourceValue.intValue());
  } else {
LOG.error("GPU is not supported in conf.");
  }
  break;
case FPGAS :
  Integer fpgaIndex = ResourceUtils.getResourceTypeIndex()
  .get(ResourceInformation.FPGA_URI);
  if (fpgaIndex != null) {
resource.setResourceValue(ResourceInformation.FPGA_URI,
resourceValue.intValue());
  } else {
LOG.error("FPGA is not supported in conf.");
  }
  break;
default :
  resource.setResourceInformation(splits[0].trim(), ResourceInformation
  .newInstance(splits[0].trim(), units, resourceValue));
  break;
}
{code}
We can also change to 

public enum AbsoluteResourceType \{ MEMORY, VCORES }

And restrict to 
{code:java}
public static final String GPU_URI = "yarn.io/gpu";
public static final String FPGA_URI = "yarn.io/fpga";
{code}
What you opinion about this?

Thanks.

 

> Support queue capacity in terms of absolute resources with gpu resourceType.
> 
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task

[jira] [Comment Edited] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306170#comment-17306170
 ] 

Qi Zhu edited comment on YARN-10503 at 3/22/21, 12:47 PM:
--

Thanks [~pbacsko] for review.

I add this to support more string types to support gpus and fpgas:
{code:java}
public enum AbsoluteResourceType {
MEMORY, VCORES, GPUS, FPGAS
}
{code}
Actually, we can also use the original case in the conf: 
{code:java}
public static final String MEMORY_URI = "memory-mb";
public static final String VCORES_URI = "vcores";
public static final String GPU_URI = "yarn.io/gpu";
public static final String FPGA_URI = "yarn.io/fpga";
{code}
such as :
{code:java}
1. 
2. {code}
It will also take effect both two situation in latest patch, the corresponding 
logic:
{code:java}
2. 
// Custom resource type defined by user.
if (!resourceTypes.contains(splits[0])) {
  resource.setResourceInformation(splits[0].trim(), ResourceInformation
  .newInstance(splits[0].trim(), units, resourceValue));
  return;
}

1. 

// map it based on key.
AbsoluteResourceType resType = AbsoluteResourceType
.valueOf(StringUtils.toUpperCase(splits[0].trim()));
switch (resType) {
case MEMORY :
  resource.setMemorySize(resourceValue);
  break;
case VCORES :
  resource.setVirtualCores(resourceValue.intValue());
  break;
case GPUS :
  Integer gpuIndex = ResourceUtils.getResourceTypeIndex()
  .get(ResourceInformation.GPU_URI);
  if (gpuIndex != null) {
resource.setResourceValue(ResourceInformation.GPU_URI,
resourceValue.intValue());
  } else {
LOG.error("GPU is not supported in conf.");
  }
  break;
case FPGAS :
  Integer fpgaIndex = ResourceUtils.getResourceTypeIndex()
  .get(ResourceInformation.FPGA_URI);
  if (fpgaIndex != null) {
resource.setResourceValue(ResourceInformation.FPGA_URI,
resourceValue.intValue());
  } else {
LOG.error("FPGA is not supported in conf.");
  }
  break;
default :
  resource.setResourceInformation(splits[0].trim(), ResourceInformation
  .newInstance(splits[0].trim(), units, resourceValue));
  break;
}
{code}
We can also change to 

public enum AbsoluteResourceType \{ MEMORY, VCORES }

And restrict to 
{code:java}
public static final String GPU_URI = "yarn.io/gpu";
public static final String FPGA_URI = "yarn.io/fpga";
{code}
What you opinion about this?

Thanks.

 


was (Author: zhuqi):
Thanks [~pbacsko] for review.

I add this to support more string types to support gpus and fpgas:
{code:java}
public enum AbsoluteResourceType {
MEMORY, VCORES, GPUS, FPGAS
}
{code}
Actually, we can also use the original case in the conf: 
{code:java}
public static final String MEMORY_URI = "memory-mb";
public static final String VCORES_URI = "vcores";
public static final String GPU_URI = "yarn.io/gpu";
public static final String FPGA_URI = "yarn.io/fpga";
{code}
such as :
{code:java}
1. 
2. {code}
It will also take effect both two situation in latest patch, the corresponding 
logic:
{code:java}
2. 
// Custom resource type defined by user.
if (!resourceTypes.contains(splits[0])) {
  resource.setResourceInformation(splits[0].trim(), ResourceInformation
  .newInstance(splits[0].trim(), units, resourceValue));
  return;
}

1. 

// map it based on key.
AbsoluteResourceType resType = AbsoluteResourceType
.valueOf(StringUtils.toUpperCase(splits[0].trim()));
switch (resType) {
case MEMORY :
  resource.setMemorySize(resourceValue);
  break;
case VCORES :
  resource.setVirtualCores(resourceValue.intValue());
  break;
case GPUS :
  Integer gpuIndex = ResourceUtils.getResourceTypeIndex()
  .get(ResourceInformation.GPU_URI);
  if (gpuIndex != null) {
resource.setResourceValue(ResourceInformation.GPU_URI,
resourceValue.intValue());
  } else {
LOG.error("GPU is not supported in conf.");
  }
  break;
case FPGAS :
  Integer fpgaIndex = ResourceUtils.getResourceTypeIndex()
  .get(ResourceInformation.FPGA_URI);
  if (fpgaIndex != null) {
resource.setResourceValue(ResourceInformation.FPGA_URI,
resourceValue.intValue());
  } else {
LOG.error("FPGA is not supported in conf.");
  }
  break;
default :
  resource.setResourceInformation(splits[0].trim(), ResourceInformation
  .newInstance(splits[0].trim(), units, resourceValue));
  break;
}
{code}
 

> Support queue capacity in terms of absolute resources with gpu resourceType.
> 
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different 

[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306170#comment-17306170
 ] 

Qi Zhu commented on YARN-10503:
---

Thanks [~pbacsko] for review.

I add this to support more string types to support gpus and fpgas:
{code:java}
public enum AbsoluteResourceType {
MEMORY, VCORES, GPUS, FPGAS
}
{code}
Actually, we can also use the original case in the conf: 
{code:java}
public static final String MEMORY_URI = "memory-mb";
public static final String VCORES_URI = "vcores";
public static final String GPU_URI = "yarn.io/gpu";
public static final String FPGA_URI = "yarn.io/fpga";
{code}
such as :
{code:java}
1. 
2. {code}
It will also take effect both two situation in latest patch, the corresponding 
logic:
{code:java}
2. 
// Custom resource type defined by user.
if (!resourceTypes.contains(splits[0])) {
  resource.setResourceInformation(splits[0].trim(), ResourceInformation
  .newInstance(splits[0].trim(), units, resourceValue));
  return;
}

1. 

// map it based on key.
AbsoluteResourceType resType = AbsoluteResourceType
.valueOf(StringUtils.toUpperCase(splits[0].trim()));
switch (resType) {
case MEMORY :
  resource.setMemorySize(resourceValue);
  break;
case VCORES :
  resource.setVirtualCores(resourceValue.intValue());
  break;
case GPUS :
  Integer gpuIndex = ResourceUtils.getResourceTypeIndex()
  .get(ResourceInformation.GPU_URI);
  if (gpuIndex != null) {
resource.setResourceValue(ResourceInformation.GPU_URI,
resourceValue.intValue());
  } else {
LOG.error("GPU is not supported in conf.");
  }
  break;
case FPGAS :
  Integer fpgaIndex = ResourceUtils.getResourceTypeIndex()
  .get(ResourceInformation.FPGA_URI);
  if (fpgaIndex != null) {
resource.setResourceValue(ResourceInformation.FPGA_URI,
resourceValue.intValue());
  } else {
LOG.error("FPGA is not supported in conf.");
  }
  break;
default :
  resource.setResourceInformation(splits[0].trim(), ResourceInformation
  .newInstance(splits[0].trim(), units, resourceValue));
  break;
}
{code}
 

> Support queue capacity in terms of absolute resources with gpu resourceType.
> 
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306163#comment-17306163
 ] 

Qi Zhu commented on YARN-10704:
---

Thanks [~pbacsko] for review.

I have fixed above in latest patch.

> The CS effective capacity for absolute mode in UI should support GPU and 
> other custom resources.
> 
>
> Key: YARN-10704
> URL: https://issues.apache.org/jira/browse/YARN-10704
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10704.001.patch, YARN-10704.002.patch, 
> YARN-10704.003.patch, image-2021-03-19-12-05-28-412.png, 
> image-2021-03-19-12-08-35-273.png
>
>
> Actually there are no information about the effective capacity about GPU in 
> UI for absolute resource mode.
> !image-2021-03-19-12-05-28-412.png|width=873,height=136!
> But we have this information in QueueMetrics:
> !image-2021-03-19-12-08-35-273.png|width=613,height=268!
>  
> It's very important for our GPU users to use in absolute mode, there still 
> have nothing to know GPU absolute information in CS Queue UI. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

2021-03-22 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10704:
--
Attachment: YARN-10704.003.patch

> The CS effective capacity for absolute mode in UI should support GPU and 
> other custom resources.
> 
>
> Key: YARN-10704
> URL: https://issues.apache.org/jira/browse/YARN-10704
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10704.001.patch, YARN-10704.002.patch, 
> YARN-10704.003.patch, image-2021-03-19-12-05-28-412.png, 
> image-2021-03-19-12-08-35-273.png
>
>
> Actually there are no information about the effective capacity about GPU in 
> UI for absolute resource mode.
> !image-2021-03-19-12-05-28-412.png|width=873,height=136!
> But we have this information in QueueMetrics:
> !image-2021-03-19-12-08-35-273.png|width=613,height=268!
>  
> It's very important for our GPU users to use in absolute mode, there still 
> have nothing to know GPU absolute information in CS Queue UI. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.

2021-03-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306157#comment-17306157
 ] 

Peter Bacsko commented on YARN-10503:
-

The question is this part:

{noformat}
public enum AbsoluteResourceType {
MEMORY, VCORES, GPUS, FPGAS
}
{noformat}

Do we want to treat GPUs and FPGAs like that? In other parts of the code, we 
have mem/vcore as primary resources, then an array of other resources.  For 
example, constructors from {{org.apache.hadoop.yarn.api.records.Resource}}:

{noformat}
  @Public
  @Stable
  public static Resource newInstance(long memory, int vCores,
  Map others) {
if (others != null) {
  return new LightWeightResource(memory, vCores,
  ResourceUtils.createResourceTypesArray(others));
} else {
  return newInstance(memory, vCores);
}
  }

  @InterfaceAudience.Private
  @InterfaceStability.Unstable
  public static Resource newInstance(Resource resource) {
Resource ret;
int numberOfKnownResourceTypes = ResourceUtils
.getNumberOfKnownResourceTypes();
if (numberOfKnownResourceTypes > 2) {
  ret = new LightWeightResource(resource.getMemorySize(),
  resource.getVirtualCores(), resource.getResources());
} else {
  ret = new LightWeightResource(resource.getMemorySize(),
  resource.getVirtualCores());
}
return ret;
  }
{noformat}

But with this modification, we sort of promote GPU and FPGA to the level of 
vcore and memory, at least from the perspective of the code and it also becomes 
inconsistent with the existing code.

This is just my opinion though. cc [~epayne] [~ebadger].

> Support queue capacity in terms of absolute resources with gpu resourceType.
> 
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.

2021-03-22 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306154#comment-17306154
 ] 

Peter Bacsko commented on YARN-10704:
-

Thanks [~zhuqi] I have some minor comments:

1.
{noformat}
sb.append(" The CS effective capacity for absolute mode in UI should support GPU and 
> other custom resources.
> 
>
> Key: YARN-10704
> URL: https://issues.apache.org/jira/browse/YARN-10704
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10704.001.patch, YARN-10704.002.patch, 
> image-2021-03-19-12-05-28-412.png, image-2021-03-19-12-08-35-273.png
>
>
> Actually there are no information about the effective capacity about GPU in 
> UI for absolute resource mode.
> !image-2021-03-19-12-05-28-412.png|width=873,height=136!
> But we have this information in QueueMetrics:
> !image-2021-03-19-12-08-35-273.png|width=613,height=268!
>  
> It's very important for our GPU users to use in absolute mode, there still 
> have nothing to know GPU absolute information in CS Queue UI. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306074#comment-17306074
 ] 

Qi Zhu edited comment on YARN-10503 at 3/22/21, 10:47 AM:
--

 As [~epayne] suggested.

Updated the absolute queue resource feature in a general way for custom 
resources in latest patch.

[~epayne] [~ebadger] [~gandras] [~pbacsko]

Could you help review this?

Thanks.


was (Author: zhuqi):
 As [~epayne] suggested.

[~epayne]  [~ebadger] [~gandras] [~pbacsko]

Updated the absolute queue resource feature in a general way for custom 
resources in latest patch.

Could you help review this?

Thanks.

> Support queue capacity in terms of absolute resources with gpu resourceType.
> 
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10707) Support gpu in ResourceUtilization, and update Node GPU Utilization to use.

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306098#comment-17306098
 ] 

Qi Zhu commented on YARN-10707:
---

Fixed the checkstyle in latest patch.

 

> Support gpu in ResourceUtilization, and update Node GPU Utilization to use.
> ---
>
> Key: YARN-10707
> URL: https://issues.apache.org/jira/browse/YARN-10707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10707.001.patch, YARN-10707.002.patch, 
> YARN-10707.003.patch
>
>
> Support gpu in ResourceUtilization, and update Node GPU Utilization to use 
> first.
> It will be very helpful for other use cases about GPU utilization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306074#comment-17306074
 ] 

Qi Zhu commented on YARN-10503:
---

 As [~epayne] suggested.

[~epayne]  [~ebadger] [~gandras] [~pbacsko]

Updated the absolute queue resource feature in a general way for custom 
resources in latest patch.

Could you help review this?

Thanks.

> Support queue capacity in terms of absolute resources with gpu resourceType.
> 
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.

2021-03-22 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10503:
--
Attachment: YARN-10503.003.patch

> Support queue capacity in terms of absolute resources with gpu resourceType.
> 
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10503.001.patch, YARN-10503.002.patch, 
> YARN-10503.003.patch
>
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10707) Support gpu in ResourceUtilization, and update Node GPU Utilization to use.

2021-03-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306068#comment-17306068
 ] 

Hadoop QA commented on YARN-10707:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:blue}0{color} | {color:blue} buf {color} | {color:blue}  0m  0s{color} 
| {color:blue}{color} | {color:blue} buf was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 8 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
45s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
25s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
36s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  6m 
16s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
24m  1s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
2s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
21s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 44m 
24s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
38s{color} | {color:blue}{color} | {color:blue} 
branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
 no spotbugs output file (spotbugsXml.xml) {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 9s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
17s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 21m 17s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/833/artifact/out/diff-compile-cc-root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt{color}
 | {color:red} root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 45 new + 367 unchanged - 45 
fixed = 412 total (was 412) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m 
17s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
51s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 18m 51s{color} | 

[jira] [Comment Edited] (YARN-10370) [Umbrella] Reduce the feature gap between FS Placement Rules and CS Queue Mapping rules

2021-03-22 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306053#comment-17306053
 ] 

Szilard Nemeth edited comment on YARN-10370 at 3/22/21, 9:37 AM:
-

Hi [~pbacsko],
Good idea.
Let's wait for [~shuzirra]'s patches to be merged and I suggest to create a 
"Part 2" version, similar to this Umbrella but for minor / leftover items and 
also for items that we might find later.
Btw [~shuzirra]: We can put this to "In progress" :D


was (Author: snemeth):
Hi [~pbacsko],
Good idea.
Let's wait for [~shuzirra]'s patches to be merged and I suggest to create a 
"Part 2" version, similar to this Umbrella but for minor / leftover items and 
also for items that we might find later.

> [Umbrella] Reduce the feature gap between FS Placement Rules and CS Queue 
> Mapping rules
> ---
>
> Key: YARN-10370
> URL: https://issues.apache.org/jira/browse/YARN-10370
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: capacity-scheduler, capacityscheduler
> Attachments: MappingRuleEnhancements.pdf, Possible extensions of 
> mapping rule format in Capacity Scheduler.pdf
>
>
> To continue closing the feature gaps between Fair Scheduler and Capacity 
> Scheduler to help users migrate between the scheduler more easy, we need to 
> add some of the Fair Scheduler placement rules to the capacity scheduler's 
> queue mapping functionality.
> With [~snemeth] and [~pbacsko] we've created the following design docs about 
> the proposed changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10370) [Umbrella] Reduce the feature gap between FS Placement Rules and CS Queue Mapping rules

2021-03-22 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306053#comment-17306053
 ] 

Szilard Nemeth commented on YARN-10370:
---

Hi [~pbacsko],
Good idea.
Let's wait for [~shuzirra]'s patches to be merged and I suggest to create a 
"Part 2" version, similar to this Umbrella but for minor / leftover items and 
also for items that we might find later.

> [Umbrella] Reduce the feature gap between FS Placement Rules and CS Queue 
> Mapping rules
> ---
>
> Key: YARN-10370
> URL: https://issues.apache.org/jira/browse/YARN-10370
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>  Labels: capacity-scheduler, capacityscheduler
> Attachments: MappingRuleEnhancements.pdf, Possible extensions of 
> mapping rule format in Capacity Scheduler.pdf
>
>
> To continue closing the feature gaps between Fair Scheduler and Capacity 
> Scheduler to help users migrate between the scheduler more easy, we need to 
> add some of the Fair Scheduler placement rules to the capacity scheduler's 
> queue mapping functionality.
> With [~snemeth] and [~pbacsko] we've created the following design docs about 
> the proposed changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10707) Support gpu in ResourceUtilization, and update Node GPU Utilization to use.

2021-03-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306036#comment-17306036
 ] 

Hadoop QA commented on YARN-10707:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
26s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:blue}0{color} | {color:blue} buf {color} | {color:blue}  0m  0s{color} 
| {color:blue}{color} | {color:blue} buf was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 8 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
55s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
16s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 23m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m  
3s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  4m 
 3s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
43s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
26m 10s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
35s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 45m 
59s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
32s{color} | {color:blue}{color} | {color:blue} 
branch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
 no spotbugs output file (spotbugsXml.xml) {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
48s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 22m 48s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/832/artifact/out/diff-compile-cc-root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt{color}
 | {color:red} root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 39 new + 373 unchanged - 39 
fixed = 412 total (was 412) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 22m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
49s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 18m 49s{color} | 

[jira] [Comment Edited] (YARN-9618) NodeListManager event improvement

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305978#comment-17305978
 ] 

Qi Zhu edited comment on YARN-9618 at 3/22/21, 8:25 AM:


Thanks [~gandras] for deep into.

You are right, main performance gain here is due to eliminating the unnecessary 
back reference to rmDispatcher on RMAppNodeUpdateEvent.

Actually, why we use an other async dispatcher here is try to make the 
rmDispatcher#eventQueue not boom to affect other event process. The boom will 
transformed to nodeListManagerDispatcher#eventQueue.

But the nodeListManagerDispatcher#eventQueue will boom also in heavy case, if 
we want the make nodeListManagerDispatcher#eventQueue not full in heavy case, 
it is another problem, this issue will not handle, we can discuss in 
multi-thread related issues.

And if we remove the async dispatcher here? Just keep the eliminating the 
unnecessary back reference to rmDispatcher on RMAppNodeUpdateEvent ?

cc [~pbacsko]  [~ebadger] 

What's your opinion?

Thanks.


was (Author: zhuqi):
Thanks [~gandras] for deep into.

You are right, main performance gain here is due to eliminating the unnecessary 
back reference to rmDispatcher on RMAppNodeUpdateEvent.

Actually, why we use an other async dispatcher here is try to make the 
rmDispatcher#eventQueue not boom to affect other event process.

But the nodeListManagerDispatcher#eventQueue will boom also in heavy case, if 
we want the make nodeListManagerDispatcher#eventQueue not full in heavy case, 
it is another problem, this issue will not handle, we can discuss in 
multi-thread related issues.

And if we remove the async dispatcher here? Just keep the eliminating the 
unnecessary back reference to rmDispatcher on RMAppNodeUpdateEvent ?

cc [~pbacsko]  [~ebadger] 

What's your opinion?

Thanks.

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9618) NodeListManager event improvement

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305978#comment-17305978
 ] 

Qi Zhu edited comment on YARN-9618 at 3/22/21, 8:21 AM:


Thanks [~gandras] for deep into.

You are right, main performance gain here is due to eliminating the unnecessary 
back reference to rmDispatcher on RMAppNodeUpdateEvent.

Actually, why we use an other async dispatcher here is try to make the 
rmDispatcher#eventQueue not boom to affect other event process.

But the nodeListManagerDispatcher#eventQueue will boom also in heavy case, if 
we want the make nodeListManagerDispatcher#eventQueue not full in heavy case, 
it is another problem, this issue will not handle, we can discuss in 
multi-thread related issues.

And if we remove the async dispatcher here? Just keep the eliminating the 
unnecessary back reference to rmDispatcher on RMAppNodeUpdateEvent ?

cc [~pbacsko]  [~ebadger] 

What's your opinion?

Thanks.


was (Author: zhuqi):
[~gandras]

You are right, main performance gain here is due to eliminating the unnecessary 
back reference to rmDispatcher on RMAppNodeUpdateEvent.

Actually, why we use an other async dispatcher here is try to make the 
rmDispatcher#eventQueue not boom to affect other event process.

But the nodeListManagerDispatcher#eventQueue will boom also in heavy case, if 
we want the make nodeListManagerDispatcher#eventQueue not full in heavy case, 
it is another problem, this issue will not handle, we can discuss in 
multi-thread related issues.

And if we remove the async dispatcher here? Just keep the eliminating the 
unnecessary back reference to rmDispatcher on RMAppNodeUpdateEvent ?

cc [~pbacsko]  [~ebadger] 

What's your opinion?

Thanks.

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305978#comment-17305978
 ] 

Qi Zhu commented on YARN-9618:
--

[~gandras]

You are right, main performance gain here is due to eliminating the unnecessary 
back reference to rmDispatcher on RMAppNodeUpdateEvent.

Actually, why we use an other async dispatcher here is try to make the 
rmDispatcher#eventQueue not boom to affect other event process.

But the nodeListManagerDispatcher#eventQueue will boom also in heavy case, if 
we want the make nodeListManagerDispatcher#eventQueue not full in heavy case, 
it is another problem, this issue will not handle, we can discuss in 
multi-thread related issues.

And if we remove the async dispatcher here? Just keep the eliminating the 
unnecessary back reference to rmDispatcher on RMAppNodeUpdateEvent ?

cc [~pbacsko]  [~ebadger] 

What's your opinion?

Thanks.

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9618) NodeListManager event improvement

2021-03-22 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305953#comment-17305953
 ] 

Andras Gyori commented on YARN-9618:


Thank you [~zhuqi] for the patch. I have analysed the code a bit and I think 
the main performance gain here is due to eliminating the unnecessary back 
reference to rmDispatcher on RMAppNodeUpdateEvent. Is using an other async 
dispatcher justified here? My standing on this issue is:
 * The rmDispatcher will still have its eventQueue filled with 
NodeListManagerEvents.
 * The new async dispatcher is an other layer of abstraction, and its sole 
purpose is copying the events from the rmDispatcher to its own event queue then 
handling them just as rmDispatcher would do
 * The NodeListManager#handle will block on getting RMApp instances, because 
they are stored in a ConcurrentMap

I think the new async dispatcher only makes sense, if the 
NodeListManager#sendRMAppNodeUpdateEventToNonFinalizedApps blocks the 
rmDispatcher thread for more time, than it takes to copy an event from 
rmDispatcher#eventQueue to nodeListManagerDispatcher#eventQueue. Checking the 
performance gain with and without the async dispatcher would be a really 
helpful metric here.

> NodeListManager event improvement
> -
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10684) YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried adding flag -promote_opportunistic_after_start

2021-03-22 Thread Sushanta Sen (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanta Sen updated YARN-10684:

Description: 
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : :

Job Command :: Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC -*promote_opportunistic_after_start*

Actual Result: Distributed Shell Yarn Job Failed almost all times with below 
Diagnostics message

*[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated 
= 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to 
make room for Guaranteed Container.]*

Expected Result: DS job should be successful with argument 
"promote_opportunistic_after_start"  **  ** 

  was:
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : :

Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client 
-jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar
 -shell_command sleep -shell_args 20 -num_containers 10 -container_type 
OPPORTUNISTIC -*promote_opportunistic_after_start*

Actual Result: Distributed Shell Yarn Job Failed almost all times with below 
Diagnostics message

*[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated 
= 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to 
make room for Guaranteed Container.]*

Expected Result: DS job should be successful with argument 
"promote_opportunistic_after_start" * ** *


> YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried 
> adding flag -promote_opportunistic_after_start 
> ---
>
> Key: YARN-10684
> URL: https://issues.apache.org/jira/browse/YARN-10684
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-scheduling
>Affects Versions: 3.1.1
>Reporter: Sushanta Sen
>Priority: Major
>
> Preconditions:
>  # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
>  # Set the below parameters  in RM yarn-site.xml ::
>  yarn.resourcemanager.opportunistic-container-allocation.enabled
>  true
>  
>  # Set this in NM[s]yarn-site.xml ::: 
>  yarn.nodemanager.opportunistic-containers-max-queue-length
>  30
>  
>  
>  Test Steps:
> Job Command : :
> Job Command :: Job Command : : yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client jar 
> HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
>  -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
> OPPORTUNISTIC -*promote_opportunistic_after_start*
> Actual Result: Distributed Shell Yarn Job Failed almost all times with below 
> Diagnostics message
> *[ Failed Reason : Application Failure: desired = 10, completed = 10, 
> allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container 
> Killed to make room for Guaranteed Container.]*
> Expected Result: DS job should be successful with argument 
> "promote_opportunistic_after_start"  **  ** 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for

2021-03-22 Thread Sushanta Sen (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanta Sen updated YARN-10670:

Description: 
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed 
= 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.

  was:
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed 
= 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.


> YARN: Opportunistic Container : : In distributed shell job if containers are 
> killed then application is failed. But in this case as containers are killed 
> to make room for guaranteed containers which is not correct to fail an 
> application
> 
>
> Key: YARN-10670
> URL: https://issues.apache.org/jira/browse/YARN-10670
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Affects Versions: 3.1.1
>Reporter: Sushanta Sen
>Assignee: Bilwa S T
>Priority: Major
>
> Preconditions:
>  # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
>  # Set the below parameters  in RM yarn-site.xml ::
>  yarn.resourcemanager.opportunistic-container-allocation.enabled
>  true
>  
>  # Set this in NM[s]yarn-site.xml ::: 
>  yarn.nodemanager.opportunistic-containers-max-queue-length
>  30
>  
>  
>  Test Steps:
> Job Command : : yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client jar 
> HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar
>  -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
> OPPORTUNISTIC
> Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics 
> message
> {noformat}
> Attempt recovered after RM restartApplication Failure: desired = 20, 
> completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
> 22:11:48.440]Container De-queued to meet NM queuing limits.
> [2021-02-09 22:11:48.441]Container terminated before launch.
> {noformat}
>  Expected Result: Distributed Shell Yarn Job should not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for

2021-03-22 Thread Sushanta Sen (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanta Sen updated YARN-10670:

Description: 
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-*.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed 
= 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.

  was:
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM yarn-site.xml ::
 yarn.resourcemanager.opportunistic-container-allocation.enabled
 true
 
 # Set this in NM[s]yarn-site.xml ::: 
 yarn.nodemanager.opportunistic-containers-max-queue-length
 30
 

 
 Test Steps:

Job Command : : yarn 
org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar
 -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
OPPORTUNISTIC

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed 
= 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.


> YARN: Opportunistic Container : : In distributed shell job if containers are 
> killed then application is failed. But in this case as containers are killed 
> to make room for guaranteed containers which is not correct to fail an 
> application
> 
>
> Key: YARN-10670
> URL: https://issues.apache.org/jira/browse/YARN-10670
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Affects Versions: 3.1.1
>Reporter: Sushanta Sen
>Assignee: Bilwa S T
>Priority: Major
>
> Preconditions:
>  # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
>  # Set the below parameters  in RM yarn-site.xml ::
>  yarn.resourcemanager.opportunistic-container-allocation.enabled
>  true
>  
>  # Set this in NM[s]yarn-site.xml ::: 
>  yarn.nodemanager.opportunistic-containers-max-queue-length
>  30
>  
>  
>  Test Steps:
> Job Command : : yarn 
> org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-*.jar
>  -shell_command sleep -shell_args 20 -num_containers 20 -container_type 
> OPPORTUNISTIC
> Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics 
> message
> {noformat}
> Attempt recovered after RM restartApplication Failure: desired = 20, 
> completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 
> 22:11:48.440]Container De-queued to meet NM queuing limits.
> [2021-02-09 22:11:48.441]Container terminated before launch.
> {noformat}
>  Expected Result: Distributed Shell Yarn Job should not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-03-22 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305941#comment-17305941
 ] 

Andras Gyori commented on YARN-9927:


Thanks [~zhuqi] for the answer, I agree with your concerns. Ideally, we could 
predefine a small threadpool on creation, and decide on EventType registration 
whether we want that kind of event to be handled separately in a thread. I 
think a separate thread for NodeEvents and AppAttempts make sense, but we will 
not need a new thread for every event type.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9927) RM multi-thread event processing mechanism

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305895#comment-17305895
 ] 

Qi Zhu edited comment on YARN-9927 at 3/22/21, 6:11 AM:


Thanks [~gandras] for investigation and reply.

I agree with you that which you suggested is a better mode, the only concern to 
me is that if the thread number will be too many because it will be consistent 
with the number of EventType, and if it will cause some side effect compare 
with the original mode.

I make sense to me after investigation, the number seems not a problem, we can 
only add multi thread to those eventType which are in a big pressure.

But we also should make stress test to the new mode. 

Let's wait for [~pbacsko] advice. :D

Thanks. 


was (Author: zhuqi):
Thanks [~gandras] for investigation and reply.

I agree with you that which you suggested is a better mode, the only concern to 
me is that if the thread number will be too many because it will be consistent 
with the number of EventType, and if it will cause some side effect compare 
with the original mode.

I make sense to me after investigation, the number seems not a problem, we can 
only add multi thread to those eventType which are in a big pressure.

But we also should make stress test to the new mode. 

Thanks. 

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9927) RM multi-thread event processing mechanism

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305895#comment-17305895
 ] 

Qi Zhu edited comment on YARN-9927 at 3/22/21, 6:09 AM:


Thanks [~gandras] for investigation and reply.

I agree with you that which you suggested is a better mode, the only concern to 
me is that if the thread number will be too many because it will be consistent 
with the number of EventType, and if it will cause some side effect compare 
with the original mode.

I make sense to me after investigation, the number seems not a problem, we can 
only add multi thread to those eventType which are in a big pressure.

But we also should make stress test to the new mode. 

Thanks. 


was (Author: zhuqi):
Thanks [~gandras] for investigation and reply.

I agree with you that which you suggested is a better mode, the only concern to 
me is that if the thread number will be too many because it will be consistent 
with the number of EventType, and if it will cause some side effect compare 
with the original mode.

We also should make stress test to the new mode. 

Thanks. 

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9927) RM multi-thread event processing mechanism

2021-03-22 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305895#comment-17305895
 ] 

Qi Zhu edited comment on YARN-9927 at 3/22/21, 6:06 AM:


Thanks [~gandras] for investigation and reply.

I agree with you that which you suggested is a better mode, the only concern to 
me is that if the thread number will be too many because it will be consistent 
with the number of EventType, and if it will cause some side effect compare 
with the original mode.

We also should make stress test to the new mode. 

Thanks. 


was (Author: zhuqi):
Thanks [~gandras] for investigation and reply.

I agree with you that which you suggested is a better mode, the only concern to 
me is that if the thread number will be too many, and if it will cause some 
side effect compare with the original mode.

We also should make stress test to the new mode. 

Thanks. 

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10706) Upgrade com.github.eirslett:frontend-maven-plugin to 1.11.2

2021-03-22 Thread Mingliang Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu resolved YARN-10706.
--
Fix Version/s: 3.2.3
   3.3.1
   3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Committed to Hadoop 3.x branches. Thank you for your review [~ayushtkn] and 
[~aajisaka]!

> Upgrade com.github.eirslett:frontend-maven-plugin to 1.11.2
> ---
>
> Key: YARN-10706
> URL: https://issues.apache.org/jira/browse/YARN-10706
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: buid
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It has been years since the com.github.eirslett:frontend-maven-plugin plugin 
> was brought in. According to its [release 
> notes|https://github.com/eirslett/frontend-maven-plugin/blob/master/CHANGELOG.md],
>  recent versions have bug fixes and support more platforms. Specially, it 
> also supports the Apple Silicon chip so we can build Hadoop on latest macOS 
> platforms.
> This is to upgrade this plugin to the latest published version 1.11.2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org