[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316834#comment-17316834
 ] 

Qi Zhu commented on YARN-10564:
---

Thanks [~pbacsko] review and suggestion,  [~gandras] for update, it's more 
clear.

LGTM +1 

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.006.patch, YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-04-07 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316612#comment-17316612
 ] 

Hadoop QA commented on YARN-10702:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m  
6s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
28s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
11s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
34s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
33s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 52s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
4s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 24m 
18s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  5m 
25s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 1s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 10s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/902/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color}
 | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 
287 unchanged - 0 fixed = 289 total (was 287) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
22s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} The patch has no ill-formed 
XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 36s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
58s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  5m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} hadoop-yarn-api in the patch 
passed. {color} 

[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316516#comment-17316516
 ] 

Hadoop QA commented on YARN-10564:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
23s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
43s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 28s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
37s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
51s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/901/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 133 unchanged - 0 fixed = 134 total (was 133) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  6s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| 

[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2021-04-07 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316489#comment-17316489
 ] 

Jim Brennan commented on YARN-10475:


[~chaosju] thanks for your comment.  The implementation we provided here is 
using overall cluster utilization vs node utilization to adjust the heartbeat 
so that under-utilized nodes get more scheduling opportunities.  Note that this 
feature was developed internally on branch-2 before the global scheduler was 
added.   It has worked well to help keep our nodes more evenly utilized. 

I think that other metrics for scaling the heartbeat are definitely worth 
exploring, which is why we filed [YARN-10478] to make it pluggable.  That would 
be a good place to make suggestions for alternate approaches.


> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-04-07 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316478#comment-17316478
 ] 

Jim Brennan commented on YARN-10702:


Thanks again [~ebadger]!  I put up additional patches for branch-3.2 and 
branch-3.1. 


> Add cluster metric for amount of CPU used by RM Event Processor
> ---
>
> Key: YARN-10702
> URL: https://issues.apache.org/jira/browse/YARN-10702
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.4.0, 3.3.1
>
> Attachments: Scheduler-Busy.png, YARN-10702-branch-3.1.006.patch, 
> YARN-10702-branch-3.2.006.patch, YARN-10702-branch-3.3.006.patch, 
> YARN-10702.001.patch, YARN-10702.002.patch, YARN-10702.003.patch, 
> YARN-10702.004.patch, YARN-10702.005.patch, YARN-10702.006.patch, 
> simon-scheduler-busy.png
>
>
> Add a cluster metric to track the cpu usage of the ResourceManager Event 
> Processing thread.   This lets us know when the critical path of the RM is 
> running out of headroom.
> This feature was originally added for us internally by [~nroberts] and we've 
> been running with it on production clusters for nearly four years.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-04-07 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-10702:
---
Attachment: YARN-10702-branch-3.2.006.patch

> Add cluster metric for amount of CPU used by RM Event Processor
> ---
>
> Key: YARN-10702
> URL: https://issues.apache.org/jira/browse/YARN-10702
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.4.0, 3.3.1
>
> Attachments: Scheduler-Busy.png, YARN-10702-branch-3.1.006.patch, 
> YARN-10702-branch-3.2.006.patch, YARN-10702-branch-3.3.006.patch, 
> YARN-10702.001.patch, YARN-10702.002.patch, YARN-10702.003.patch, 
> YARN-10702.004.patch, YARN-10702.005.patch, YARN-10702.006.patch, 
> simon-scheduler-busy.png
>
>
> Add a cluster metric to track the cpu usage of the ResourceManager Event 
> Processing thread.   This lets us know when the critical path of the RM is 
> running out of headroom.
> This feature was originally added for us internally by [~nroberts] and we've 
> been running with it on production clusters for nearly four years.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-04-07 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-10702:
---
Attachment: YARN-10702-branch-3.1.006.patch

> Add cluster metric for amount of CPU used by RM Event Processor
> ---
>
> Key: YARN-10702
> URL: https://issues.apache.org/jira/browse/YARN-10702
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.4.0, 3.3.1
>
> Attachments: Scheduler-Busy.png, YARN-10702-branch-3.1.006.patch, 
> YARN-10702-branch-3.2.006.patch, YARN-10702-branch-3.3.006.patch, 
> YARN-10702.001.patch, YARN-10702.002.patch, YARN-10702.003.patch, 
> YARN-10702.004.patch, YARN-10702.005.patch, YARN-10702.006.patch, 
> simon-scheduler-busy.png
>
>
> Add a cluster metric to track the cpu usage of the ResourceManager Event 
> Processing thread.   This lets us know when the critical path of the RM is 
> running out of headroom.
> This feature was originally added for us internally by [~nroberts] and we've 
> been running with it on production clusters for nearly four years.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316378#comment-17316378
 ] 

Andras Gyori edited comment on YARN-10564 at 4/7/21, 2:25 PM:
--

Thank you [~pbacsko] for the suggestions. I have incorporated your ideas and 
uploaded a new revision. Realised that we will need the template configurations 
on the parents. Also, hopefully I have made the logic simpler and more readable.


was (Author: gandras):
I have uploaded a new revision. I have realised that we will need the template 
configurations on the parents. Also, hopefully I have made the logic simpler 
and more readable.

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.006.patch, YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316378#comment-17316378
 ] 

Andras Gyori commented on YARN-10564:
-

I have uploaded a new revision. I have realised that we will need the template 
configurations on the parents. Also, hopefully I have made the logic simpler 
and more readable.

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.006.patch, YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10564:

Attachment: YARN-10564.006.patch

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.006.patch, YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization

2021-04-07 Thread chaosju (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316340#comment-17316340
 ] 

chaosju commented on YARN-10475:


Why adaptive Heartbeat ?
 * Regular heartbeats can overload RM.
 * if RM is overloaded things get worse over time as events queue up.
 * Lower work efficiency as important events at NM/AM need to wait for next 
heartbeat to let RM know of their status.
 * Not every heartbeat from a node or AM may be important. If nodes are running 
full, heartbeats from such nodes would not be useful for application 
scheduling. 
 * RM should be able to control heartbeats sent to itself

How adaptive Heartbeat ?

1.Throttle Heartbeat: 
 * HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY)
 * Statistics associated with various scheduler events (processing time vs wait 
time in queue) is collected. 
 * RM indicates the next HB interval to NM and AM to throttle the heartbeat.

2. Event based Heartbeat:
 * Send out of band heartbeat to send emergent request such as new resource 
requests, container completion etc. before the heartbeat interval indicated by 
RM. 
 * RM can notify AM when the containers have been allocated so that AM does not 
have to wait for the scheduled heartbeat to get resources.

 
Reference:[https://www.slideshare.net/vsaxenavarun/venturing-into-large-hadoop-clusters]

 

I think that the feature should  think about RM's load.

[~Jim_Brennan]

> Scale RM-NM heartbeat interval based on node utilization
> 
>
> Key: YARN-10475
> URL: https://issues.apache.org/jira/browse/YARN-10475
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 3.2.3
>
> Attachments: YARN-10475-branch-3.2.003.patch, 
> YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, 
> YARN-10475.003.patch
>
>
> Add the ability to scale the RM-NM heartbeat interval based on node cpu 
> utilization compared to overall cluster cpu utilization.  If a node is 
> over-utilized compared to the rest of the cluster, it's heartbeat interval 
> slows down.  If it is under-utilized compared to the rest of the cluster, 
> it's heartbeat interval speeds up.
> This is a feature we have been running with internally in production for 
> several years.  It was developed by [~nroberts], based on the observation 
> that larger faster nodes on our cluster were under-utilized compared to 
> smaller slower nodes. 
> This feature is dependent on [YARN-10450], which added cluster-wide 
> utilization metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2021-04-07 Thread chaosju (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chaosju updated YARN-10450:
---
Comment: was deleted

(was: Why adaptive Heartbeat ?
 * {color:#ff}Regular heartbeats can overload RM.{color}
 * {color:#ff}if RM is overloaded things get worse over time as events 
queue up.{color}
 * Lower work efficiency as important events at NM/AM need to wait for next 
heartbeat to let RM know of their status.
 * Not every heartbeat from a node or AM may be important. If nodes are running 
full, heartbeats from such nodes would not be useful for application 
scheduling. 
 * RM should be able to control heartbeats sent to itself

How adaptive Heartbeat ?

1.Throttle Heartbeat: 
 * {color:#ff} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, 
HEAVY){color}
 * Statistics associated with various scheduler events (processing time vs wait 
time in queue) is collected. 
 * RM indicates the next HB interval to NM and AM to throttle the heartbeat.

2. Event based Heartbeat:
 * Send out of band heartbeat to send emergent request such as new resource 
requests, container completion etc. before the heartbeat interval indicated by 
RM. 
 * RM can notify AM when the containers have been allocated so that AM does not 
have to wait for the scheduled heartbeat to get resources.

 
Reference:https://www.slideshare.net/vsaxenavarun/venturing-into-large-hadoop-clusters

[~Jim_Brennan] )

> Add cpu and memory utilization per node and cluster-wide metrics
> 
>
> Key: YARN-10450
> URL: https://issues.apache.org/jira/browse/YARN-10450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, 
> YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, 
> YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and 
> aggregated for the entire cluster.  This is information is already passed 
> from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it 
> useful to be able to quickly see the actual cpu/memory utilization on the 
> node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2021-04-07 Thread chaosju (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316336#comment-17316336
 ] 

chaosju edited comment on YARN-10450 at 4/7/21, 1:30 PM:
-

Why adaptive Heartbeat ?
 * {color:#ff}Regular heartbeats can overload RM.{color}
 * {color:#ff}if RM is overloaded things get worse over time as events 
queue up.{color}
 * Lower work efficiency as important events at NM/AM need to wait for next 
heartbeat to let RM know of their status.
 * Not every heartbeat from a node or AM may be important. If nodes are running 
full, heartbeats from such nodes would not be useful for application 
scheduling. 
 * RM should be able to control heartbeats sent to itself

How adaptive Heartbeat ?

1.Throttle Heartbeat: 
 * {color:#ff} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, 
HEAVY){color}
 * Statistics associated with various scheduler events (processing time vs wait 
time in queue) is collected. 
 * RM indicates the next HB interval to NM and AM to throttle the heartbeat.

2. Event based Heartbeat:
 * Send out of band heartbeat to send emergent request such as new resource 
requests, container completion etc. before the heartbeat interval indicated by 
RM. 
 * RM can notify AM when the containers have been allocated so that AM does not 
have to wait for the scheduled heartbeat to get resources.

 
Reference:https://www.slideshare.net/vsaxenavarun/venturing-into-large-hadoop-clusters

[~Jim_Brennan] 


was (Author: chaosju):
Why adaptive Heartbeat ?
 * {color:#FF}Regular heartbeats can overload RM.{color}
 * {color:#FF}if RM is overloaded things get worse over time as events 
queue up.{color}
 * Lower work efficiency as important events at NM/AM need to wait for next 
heartbeat to let RM know of their status.
 * Not every heartbeat from a node or AM may be important. If nodes are running 
full, heartbeats from such nodes would not be useful for application 
scheduling. 
 * RM should be able to control heartbeats sent to itself

How adaptive Heartbeat ?

1.Throttle Heartbeat: 
 * {color:#FF} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, 
HEAVY){color}
 * Statistics associated with various scheduler events (processing time vs wait 
time in queue) is collected. 
 * RM indicates the next HB interval to NM and AM to throttle the heartbeat.

2. Event based Heartbeat:
 * Send out of band heartbeat to send emergent request such as new resource 
requests, container completion etc. before the heartbeat interval indicated by 
RM. 
 * RM can notify AM when the containers have been allocated so that AM does not 
have to wait for the scheduled heartbeat to get resources.

 

[~Jim_Brennan] 

> Add cpu and memory utilization per node and cluster-wide metrics
> 
>
> Key: YARN-10450
> URL: https://issues.apache.org/jira/browse/YARN-10450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, 
> YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, 
> YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and 
> aggregated for the entire cluster.  This is information is already passed 
> from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it 
> useful to be able to quickly see the actual cpu/memory utilization on the 
> node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics

2021-04-07 Thread chaosju (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316336#comment-17316336
 ] 

chaosju commented on YARN-10450:


Why adaptive Heartbeat ?
 * {color:#FF}Regular heartbeats can overload RM.{color}
 * {color:#FF}if RM is overloaded things get worse over time as events 
queue up.{color}
 * Lower work efficiency as important events at NM/AM need to wait for next 
heartbeat to let RM know of their status.
 * Not every heartbeat from a node or AM may be important. If nodes are running 
full, heartbeats from such nodes would not be useful for application 
scheduling. 
 * RM should be able to control heartbeats sent to itself

How adaptive Heartbeat ?

1.Throttle Heartbeat: 
 * {color:#FF} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, 
HEAVY){color}
 * Statistics associated with various scheduler events (processing time vs wait 
time in queue) is collected. 
 * RM indicates the next HB interval to NM and AM to throttle the heartbeat.

2. Event based Heartbeat:
 * Send out of band heartbeat to send emergent request such as new resource 
requests, container completion etc. before the heartbeat interval indicated by 
RM. 
 * RM can notify AM when the containers have been allocated so that AM does not 
have to wait for the scheduled heartbeat to get resources.

 

[~Jim_Brennan] 

> Add cpu and memory utilization per node and cluster-wide metrics
> 
>
> Key: YARN-10450
> URL: https://issues.apache.org/jira/browse/YARN-10450
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.3.1
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, 
> YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, 
> YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch
>
>
> Add metrics to show actual cpu and memory utilization for each node and 
> aggregated for the entire cluster.  This is information is already passed 
> from NM to RM in the node status update.
> We have been running with this internally for quite a while and found it 
> useful to be able to quickly see the actual cpu/memory utilization on the 
> node/cluster.  It's especially useful if some form of overcommit is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316309#comment-17316309
 ] 

Peter Bacsko commented on YARN-10564:
-

Thanks [~gandras] I have the following suggestions: please add comments to the 
"for" loop which explains this. I don't want to dictate the wording. It could 
be more sentences. I think it's important. Also, maybe also comment that 
"supportedWildcardLevel" or MAX_WILDCARD_LEVEL might change in the future (just 
like me, people might realize that the range is [0-1] and it might make people 
confused).

Also, an overall comment like "collect all template settings based on prefix, 
then finally apply the collected settings to the newly created queue" might be 
useful. I'd put it somewhere before the "while" loop, but this is just an idea.

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316288#comment-17316288
 ] 

Andras Gyori commented on YARN-10564:
-

[~pbacsko] that is it exactly. 

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316277#comment-17316277
 ] 

Peter Bacsko edited comment on YARN-10564 at 4/7/21, 12:16 PM:
---

Thanks [~gandras], I think I get it. I guess the trick is the "for" loop which 
modifies "queuePathParts". First we try to find the templates for the parent 
explicitly, then we step back a wildcard at each iteration. By changing 
"queuePathParts", the prefix changes so eventually we might find a parent which 
contains templates. 

Finally, we call {{setConfigFromTemplateEntries()}} where we set the collected 
values for the original queue.

Is this correct?


was (Author: pbacsko):
Thanks [~gandras], I think I get it. I guess the trick is the "for" loop which 
modifies "queuePathParts". First we try to find the templates for the parent 
explicitly, then we step back each wildcard at a time. By changing 
"queuePathParts", the prefix changes so eventually we might find a parent which 
contains templates. 

Finally, we call {{setConfigFromTemplateEntries()}} where we set the collected 
values for the original queue.

Is this correct?

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316277#comment-17316277
 ] 

Peter Bacsko commented on YARN-10564:
-

Thanks [~gandras], I think I get it. I guess the trick is the "for" loop which 
modifies "queuePathParts". First we try to find the templates for the parent 
explicitly, then we step back each wildcard at a time. By changing 
"queuePathParts", the prefix changes so eventually we might find a parent which 
contains templates. 

Finally, we call {{setConfigFromTemplateEntries()}} where we set the collected 
values for the original queue.

Is this correct?

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316262#comment-17316262
 ] 

Andras Gyori edited comment on YARN-10564 at 4/7/21, 11:52 AM:
---

[~pbacsko] thank you for your review. I was afraid that the will be a little 
bit convoluted. This was much worse in case of legacy auto created queues, but 
I think wildcards do complicate things in this case as well. Storing the 
children templates on the Parent itself might be more clear, I will try to 
investigate that approach. What is happening in this patch:
 # The template configuration class sets the properties on the queue itself
 # Since we will support configurable depth of queue creation, we need to 
support variable length of wildcarding as well (so in case of 
QUEUE_CREATION_DEPTH=3, we need to support 2 wildcard levels eg. root.a.*.*, 
thats why we have MAX_WILDCARD_LEVEL)
 # For a dynamic LeafQueue *root.a.a1,* we start looking for template entries 
for its parent *root.a* in the configuration and increasing the wildcard level 
by 1 each iteration. An example for the iteration:
 ## root.a.auto-queue-creation-v2.template.weight
 ## root.*.auto-queue-creation-v2.template.weight
 # After this, we cut the _auto-queue-creation-v2.template_ prefix, and we set 
the actual value for the configuration. An example:
 ## from root.a.auto-queue-creation-v2.template.weight -> root.a.a1.weight
 # If we set the root.a.a1.weight explicitly (this would be used for STOPPING / 
RUNNING the queue), a template configuration would not overwrite the value. An 
example:
 ## root.a.auto-queue-creation-v2.template.weight = 2
 ## root.a.a1.weight = 3
 ## root.a.a1 will have weight = 3

I hope it does clarify the patch to some extent.


was (Author: gandras):
[~pbacsko] thank you for your review. I was afraid that the will be a little 
bit convoluted. This was much worse in case of legacy auto created queues. 
Storing the children templates on the Parent itself might be more clear, I will 
try to investigate that approach. What is happening in this patch:
 # The template configuration class sets the properties on the queue itself
 # Since we will support configurable depth of queue creation, we need to 
support variable length of wildcarding as well (so in case of 
QUEUE_CREATION_DEPTH=3, we need to support 2 wildcard levels eg. root.a.*.*, 
thats why we have MAX_WILDCARD_LEVEL)
 # For a dynamic LeafQueue *root.a.a1,* we start looking for template entries 
for its parent *root.a* in the configuration and increasing the wildcard level 
by 1 each iteration. An example for the iteration:
 ## root.a.auto-queue-creation-v2.template.weight
 ## root.*.auto-queue-creation-v2.template.weight
 # After this, we cut the _auto-queue-creation-v2.template_ prefix, and we set 
the actual value for the configuration. An example:
 ## from root.a.auto-queue-creation-v2.template.weight -> root.a.a1.weight
 # If we set the root.a.a1.weight explicitly (this would be used for STOPPING / 
RUNNING the queue), a template configuration would not overwrite the value. An 
example:
 ## root.a.auto-queue-creation-v2.template.weight = 2
 ## root.a.a1.weight = 3
 ## root.a.a1 will have weight = 3

I hope it does clarify the patch to some extent.

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316262#comment-17316262
 ] 

Andras Gyori commented on YARN-10564:
-

[~pbacsko] thank you for your review. I was afraid that the will be a little 
bit convoluted. This was much worse in case of legacy auto created queues. 
Storing the children templates on the Parent itself might be more clear, I will 
try to investigate that approach. What is happening in this patch:
 # The template configuration class sets the properties on the queue itself
 # Since we will support configurable depth of queue creation, we need to 
support variable length of wildcarding as well (so in case of 
QUEUE_CREATION_DEPTH=3, we need to support 2 wildcard levels eg. root.a.*.*, 
thats why we have MAX_WILDCARD_LEVEL)
 # For a dynamic LeafQueue *root.a.a1,* we start looking for template entries 
for its parent *root.a* in the configuration and increasing the wildcard level 
by 1 each iteration. An example for the iteration:
 ## root.a.auto-queue-creation-v2.template.weight
 ## root.*.auto-queue-creation-v2.template.weight
 # After this, we cut the _auto-queue-creation-v2.template_ prefix, and we set 
the actual value for the configuration. An example:
 ## from root.a.auto-queue-creation-v2.template.weight -> root.a.a1.weight
 # If we set the root.a.a1.weight explicitly (this would be used for STOPPING / 
RUNNING the queue), a template configuration would not overwrite the value. An 
example:
 ## root.a.auto-queue-creation-v2.template.weight = 2
 ## root.a.a1.weight = 3
 ## root.a.a1 will have weight = 3

I hope it does clarify the patch to some extent.

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316213#comment-17316213
 ] 

Peter Bacsko edited comment on YARN-10564 at 4/7/21, 11:51 AM:
---

[~gandras] thanks for the patch.
>From coding POV it looks ok, this is more like a high level review.

There's are some things I just can't figure out (maybe I'm in a bad shape 
today).

1. Let's say you set the capacity 6w for {{root.a.*}}. Then a dynamic queue 
{{root.a.newparent.newchild}} get created. How does the weight settings 
propagate to "newparent" and "newchild"? I kept looking at the code, but it's 
just not obvious. I can see that "root.a" will have an entry in 
{{templateEntries}}, but then what?

2. I can't deciper this part:
{noformat}
for (int i = 0; i <= wildcardLevel; ++i) {
queuePathParts.set(queuePathParts.size() - 1 - i, WILDCARD_QUEUE);
}
{noformat}
What's happening here?

3. There is a variable called "supportedWildcardLevel". What is "supported" 
means in this context? Later on we set it to {{Math.min(queueHierarchyParts - 
1, MAX_WILDCARD_LEVEL);}}. It seems to me that it is either 0 or 1, because 
{{MAX_WILDCARD_LEVEL}} is 1. I assume most of the time it's going to be 1? I 
don't understand what it is meant to represent.


was (Author: pbacsko):
[~gandras] thanks for the patch.
>From coding POV it looks ok, this is more like a high level review.

There's are some things I just can't figure out (maybe I'm in a bad shape 
today).

1. Let's say you set the capacity 6w for {{root.a.*}}. Then a dynamic queue 
{{root.a.newparent.newchild}} get created. How does the weight settings 
propagate to "newparent" and "newchild"? I kept looking at the code, but it's 
just not obvious. I can see that "root.a" will have an entry in 
{{templateEntries}}, but then what?

2. I can't deciper this part:
{noformat}
for (int i = 0; i <= wildcardLevel; ++i) {
queuePathParts.set(queuePathParts.size() - 1 - i, WILDCARD_QUEUE);
}
{noformat}
What's happening here?

3. There is a variable called "supportedWildcardLevel". What is "supported" 
means in this context? Later on we set it to {{Math.min(queueHierarchyParts - 
1, MAX_WILDCARD_LEVEL);}}. It seems to me that it is either 0 or 1, because 
{{MAX_WILDCARD_LEVEL}} is 1. I assume most of the time it's going to be 1? 
Mentally I don't understand what it is meant to represent.

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316213#comment-17316213
 ] 

Peter Bacsko edited comment on YARN-10564 at 4/7/21, 10:49 AM:
---

[~gandras] thanks for the patch.
>From coding POV it looks ok, this is more like a high level review.

There's are some things I just can't figure out (maybe I'm in a bad shape 
today).

1. Let's say you set the capacity 6w for {{root.a.*}}. Then a dynamic queue 
{{root.a.newparent.newchild}} get created. How does the weight settings 
propagate to "newparent" and "newchild"? I kept looking at the code, but it's 
just not obvious. I can see that "root.a" will have an entry in 
{{templateEntries}}, but then what?

2. I can't deciper this part:
{noformat}
for (int i = 0; i <= wildcardLevel; ++i) {
queuePathParts.set(queuePathParts.size() - 1 - i, WILDCARD_QUEUE);
}
{noformat}
What's happening here?

3. There is a variable called "supportedWildcardLevel". What is "supported" 
means in this context? Later on we set it to {{Math.min(queueHierarchyParts - 
1, MAX_WILDCARD_LEVEL);}}. It seems to me that it is either 0 or 1, because 
{{MAX_WILDCARD_LEVEL}} is 1. I assume most of the time it's going to be 1? 
Mentally I don't understand what it is meant to represent.


was (Author: pbacsko):
[~gandras] thanks for the patch.
>From coding POV it looks ok, this is more like a high level review.

There's are some things I just can't figure out (maybe I'm in a bad shape 
today).

1. Let's say you set 6w for {{root.a.*}}. Then a dynamic queue 
{{root.a.newparent.newchild}} get created. How does the weight settings 
propagate to "newparent" and "newchild"? I kept looking at the code, but it's 
just not obvious. I can see that "root.a" will have an entry in 
{{templateEntries}}, but then what?

2. I can't deciper this part:
{noformat}
for (int i = 0; i <= wildcardLevel; ++i) {
queuePathParts.set(queuePathParts.size() - 1 - i, WILDCARD_QUEUE);
}
{noformat}
What's happening here?

3. There is a variable called "supportedWildcardLevel". What is "supported" 
means in this context? Later on we set it to {{Math.min(queueHierarchyParts - 
1, MAX_WILDCARD_LEVEL);}} which seems to be that it is either 0 or 1, because 
{{MAX_WILDCARD_LEVEL}} is 1. I assume most of the time it's going to be 1? 
Mentally I don't understand what it is meant to represent.

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations

2021-04-07 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316213#comment-17316213
 ] 

Peter Bacsko commented on YARN-10564:
-

[~gandras] thanks for the patch.
>From coding POV it looks ok, this is more like a high level review.

There's are some things I just can't figure out (maybe I'm in a bad shape 
today).

1. Let's say you set 6w for {{root.a.*}}. Then a dynamic queue 
{{root.a.newparent.newchild}} get created. How does the weight settings 
propagate to "newparent" and "newchild"? I kept looking at the code, but it's 
just not obvious. I can see that "root.a" will have an entry in 
{{templateEntries}}, but then what?

2. I can't deciper this part:
{noformat}
for (int i = 0; i <= wildcardLevel; ++i) {
queuePathParts.set(queuePathParts.size() - 1 - i, WILDCARD_QUEUE);
}
{noformat}
What's happening here?

3. There is a variable called "supportedWildcardLevel". What is "supported" 
means in this context? Later on we set it to {{Math.min(queueHierarchyParts - 
1, MAX_WILDCARD_LEVEL);}} which seems to be that it is either 0 or 1, because 
{{MAX_WILDCARD_LEVEL}} is 1. I assume most of the time it's going to be 1? 
Mentally I don't understand what it is meant to represent.

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10714) Remove dangling dynamic queues on reinitialization

2021-04-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316182#comment-17316182
 ] 

Szilard Nemeth commented on YARN-10714:
---

Thanks [~gandras] for working on this.
Latest patch LGTM, committed to trunk.

> Remove dangling dynamic queues on reinitialization
> --
>
> Key: YARN-10714
> URL: https://issues.apache.org/jira/browse/YARN-10714
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10714.001.patch, YARN-10714.002.patch, 
> YARN-10714.003.patch
>
>
> Current logic does not handle orphaned auto created child queues. The 
> following example steps show a scenario in which it is possible to submit 
> applications to an orphaned queue, that has an invalid (already removed) 
> ParentQueue.
>  # Auto create a queue root.a.a-auto
>  # Remove root.a from the config
>  # Reinitialize CS without restarting it (possible via mutation API)
>  # Submit application to root.a.a-auto, while root.a is a non-existent queue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10727) ParentQueue does not validate the queue on removal

2021-04-07 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10727:

Parent: YARN-10496
Issue Type: Sub-task  (was: Bug)

> ParentQueue does not validate the queue on removal
> --
>
> Key: YARN-10727
> URL: https://issues.apache.org/jira/browse/YARN-10727
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>
> With the addition of YARN-10532 ParentQueue has a public method, removeQueue, 
> which allows the deletion of a queue at runtime. However, there is no 
> validation regarding the queue which is to be removed, therefore it is 
> possible to remove a queue from the CSQueueManager that is not a child of the 
> ParentQueue. Since it is a public method, there must be validations such as:
>  * check, if the parent of the queue to be removed is the current ParentQueue
>  * check, if the parent actually contains the queue in its childQueues 
> collection



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org