[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316834#comment-17316834 ] Qi Zhu commented on YARN-10564: --- Thanks [~pbacsko] review and suggestion, [~gandras] for update, it's more clear. LGTM +1 > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.006.patch, YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316612#comment-17316612 ] Hadoop QA commented on YARN-10702: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 6s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 28s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 11s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 34s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 33s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 52s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 4s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 24m 18s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 5m 25s{color} | {color:green}{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 1s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 10s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/902/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 287 unchanged - 0 fixed = 289 total (was 287) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 22s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 36s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 5m 57s{color} | {color:green}{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} hadoop-yarn-api in the patch passed. {color}
[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316516#comment-17316516 ] Hadoop QA commented on YARN-10564: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 23s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 43s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 28s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 37s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 51s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 42s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/901/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 133 unchanged - 0 fixed = 134 total (was 133) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 6s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | |
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316489#comment-17316489 ] Jim Brennan commented on YARN-10475: [~chaosju] thanks for your comment. The implementation we provided here is using overall cluster utilization vs node utilization to adjust the heartbeat so that under-utilized nodes get more scheduling opportunities. Note that this feature was developed internally on branch-2 before the global scheduler was added. It has worked well to help keep our nodes more evenly utilized. I think that other metrics for scaling the heartbeat are definitely worth exploring, which is why we filed [YARN-10478] to make it pluggable. That would be a good place to make suggestions for alternate approaches. > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10475-branch-3.2.003.patch, > YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316478#comment-17316478 ] Jim Brennan commented on YARN-10702: Thanks again [~ebadger]! I put up additional patches for branch-3.2 and branch-3.1. > Add cluster metric for amount of CPU used by RM Event Processor > --- > > Key: YARN-10702 > URL: https://issues.apache.org/jira/browse/YARN-10702 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.4.0, 3.3.1 > > Attachments: Scheduler-Busy.png, YARN-10702-branch-3.1.006.patch, > YARN-10702-branch-3.2.006.patch, YARN-10702-branch-3.3.006.patch, > YARN-10702.001.patch, YARN-10702.002.patch, YARN-10702.003.patch, > YARN-10702.004.patch, YARN-10702.005.patch, YARN-10702.006.patch, > simon-scheduler-busy.png > > > Add a cluster metric to track the cpu usage of the ResourceManager Event > Processing thread. This lets us know when the critical path of the RM is > running out of headroom. > This feature was originally added for us internally by [~nroberts] and we've > been running with it on production clusters for nearly four years. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10702: --- Attachment: YARN-10702-branch-3.2.006.patch > Add cluster metric for amount of CPU used by RM Event Processor > --- > > Key: YARN-10702 > URL: https://issues.apache.org/jira/browse/YARN-10702 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.4.0, 3.3.1 > > Attachments: Scheduler-Busy.png, YARN-10702-branch-3.1.006.patch, > YARN-10702-branch-3.2.006.patch, YARN-10702-branch-3.3.006.patch, > YARN-10702.001.patch, YARN-10702.002.patch, YARN-10702.003.patch, > YARN-10702.004.patch, YARN-10702.005.patch, YARN-10702.006.patch, > simon-scheduler-busy.png > > > Add a cluster metric to track the cpu usage of the ResourceManager Event > Processing thread. This lets us know when the critical path of the RM is > running out of headroom. > This feature was originally added for us internally by [~nroberts] and we've > been running with it on production clusters for nearly four years. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10702: --- Attachment: YARN-10702-branch-3.1.006.patch > Add cluster metric for amount of CPU used by RM Event Processor > --- > > Key: YARN-10702 > URL: https://issues.apache.org/jira/browse/YARN-10702 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.4.0, 3.3.1 > > Attachments: Scheduler-Busy.png, YARN-10702-branch-3.1.006.patch, > YARN-10702-branch-3.2.006.patch, YARN-10702-branch-3.3.006.patch, > YARN-10702.001.patch, YARN-10702.002.patch, YARN-10702.003.patch, > YARN-10702.004.patch, YARN-10702.005.patch, YARN-10702.006.patch, > simon-scheduler-busy.png > > > Add a cluster metric to track the cpu usage of the ResourceManager Event > Processing thread. This lets us know when the critical path of the RM is > running out of headroom. > This feature was originally added for us internally by [~nroberts] and we've > been running with it on production clusters for nearly four years. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316378#comment-17316378 ] Andras Gyori edited comment on YARN-10564 at 4/7/21, 2:25 PM: -- Thank you [~pbacsko] for the suggestions. I have incorporated your ideas and uploaded a new revision. Realised that we will need the template configurations on the parents. Also, hopefully I have made the logic simpler and more readable. was (Author: gandras): I have uploaded a new revision. I have realised that we will need the template configurations on the parents. Also, hopefully I have made the logic simpler and more readable. > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.006.patch, YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316378#comment-17316378 ] Andras Gyori commented on YARN-10564: - I have uploaded a new revision. I have realised that we will need the template configurations on the parents. Also, hopefully I have made the logic simpler and more readable. > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.006.patch, YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-10564: Attachment: YARN-10564.006.patch > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.006.patch, YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10475) Scale RM-NM heartbeat interval based on node utilization
[ https://issues.apache.org/jira/browse/YARN-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316340#comment-17316340 ] chaosju commented on YARN-10475: Why adaptive Heartbeat ? * Regular heartbeats can overload RM. * if RM is overloaded things get worse over time as events queue up. * Lower work efficiency as important events at NM/AM need to wait for next heartbeat to let RM know of their status. * Not every heartbeat from a node or AM may be important. If nodes are running full, heartbeats from such nodes would not be useful for application scheduling. * RM should be able to control heartbeats sent to itself How adaptive Heartbeat ? 1.Throttle Heartbeat: * HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY) * Statistics associated with various scheduler events (processing time vs wait time in queue) is collected. * RM indicates the next HB interval to NM and AM to throttle the heartbeat. 2. Event based Heartbeat: * Send out of band heartbeat to send emergent request such as new resource requests, container completion etc. before the heartbeat interval indicated by RM. * RM can notify AM when the containers have been allocated so that AM does not have to wait for the scheduled heartbeat to get resources. Reference:[https://www.slideshare.net/vsaxenavarun/venturing-into-large-hadoop-clusters] I think that the feature should think about RM's load. [~Jim_Brennan] > Scale RM-NM heartbeat interval based on node utilization > > > Key: YARN-10475 > URL: https://issues.apache.org/jira/browse/YARN-10475 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 3.2.3 > > Attachments: YARN-10475-branch-3.2.003.patch, > YARN-10475-branch-3.3.003.patch, YARN-10475.001.patch, YARN-10475.002.patch, > YARN-10475.003.patch > > > Add the ability to scale the RM-NM heartbeat interval based on node cpu > utilization compared to overall cluster cpu utilization. If a node is > over-utilized compared to the rest of the cluster, it's heartbeat interval > slows down. If it is under-utilized compared to the rest of the cluster, > it's heartbeat interval speeds up. > This is a feature we have been running with internally in production for > several years. It was developed by [~nroberts], based on the observation > that larger faster nodes on our cluster were under-utilized compared to > smaller slower nodes. > This feature is dependent on [YARN-10450], which added cluster-wide > utilization metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chaosju updated YARN-10450: --- Comment: was deleted (was: Why adaptive Heartbeat ? * {color:#ff}Regular heartbeats can overload RM.{color} * {color:#ff}if RM is overloaded things get worse over time as events queue up.{color} * Lower work efficiency as important events at NM/AM need to wait for next heartbeat to let RM know of their status. * Not every heartbeat from a node or AM may be important. If nodes are running full, heartbeats from such nodes would not be useful for application scheduling. * RM should be able to control heartbeats sent to itself How adaptive Heartbeat ? 1.Throttle Heartbeat: * {color:#ff} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY){color} * Statistics associated with various scheduler events (processing time vs wait time in queue) is collected. * RM indicates the next HB interval to NM and AM to throttle the heartbeat. 2. Event based Heartbeat: * Send out of band heartbeat to send emergent request such as new resource requests, container completion etc. before the heartbeat interval indicated by RM. * RM can notify AM when the containers have been allocated so that AM does not have to wait for the scheduled heartbeat to get resources. Reference:https://www.slideshare.net/vsaxenavarun/venturing-into-large-hadoop-clusters [~Jim_Brennan] ) > Add cpu and memory utilization per node and cluster-wide metrics > > > Key: YARN-10450 > URL: https://issues.apache.org/jira/browse/YARN-10450 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3 > > Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, > YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, > YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch > > > Add metrics to show actual cpu and memory utilization for each node and > aggregated for the entire cluster. This is information is already passed > from NM to RM in the node status update. > We have been running with this internally for quite a while and found it > useful to be able to quickly see the actual cpu/memory utilization on the > node/cluster. It's especially useful if some form of overcommit is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316336#comment-17316336 ] chaosju edited comment on YARN-10450 at 4/7/21, 1:30 PM: - Why adaptive Heartbeat ? * {color:#ff}Regular heartbeats can overload RM.{color} * {color:#ff}if RM is overloaded things get worse over time as events queue up.{color} * Lower work efficiency as important events at NM/AM need to wait for next heartbeat to let RM know of their status. * Not every heartbeat from a node or AM may be important. If nodes are running full, heartbeats from such nodes would not be useful for application scheduling. * RM should be able to control heartbeats sent to itself How adaptive Heartbeat ? 1.Throttle Heartbeat: * {color:#ff} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY){color} * Statistics associated with various scheduler events (processing time vs wait time in queue) is collected. * RM indicates the next HB interval to NM and AM to throttle the heartbeat. 2. Event based Heartbeat: * Send out of band heartbeat to send emergent request such as new resource requests, container completion etc. before the heartbeat interval indicated by RM. * RM can notify AM when the containers have been allocated so that AM does not have to wait for the scheduled heartbeat to get resources. Reference:https://www.slideshare.net/vsaxenavarun/venturing-into-large-hadoop-clusters [~Jim_Brennan] was (Author: chaosju): Why adaptive Heartbeat ? * {color:#FF}Regular heartbeats can overload RM.{color} * {color:#FF}if RM is overloaded things get worse over time as events queue up.{color} * Lower work efficiency as important events at NM/AM need to wait for next heartbeat to let RM know of their status. * Not every heartbeat from a node or AM may be important. If nodes are running full, heartbeats from such nodes would not be useful for application scheduling. * RM should be able to control heartbeats sent to itself How adaptive Heartbeat ? 1.Throttle Heartbeat: * {color:#FF} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY){color} * Statistics associated with various scheduler events (processing time vs wait time in queue) is collected. * RM indicates the next HB interval to NM and AM to throttle the heartbeat. 2. Event based Heartbeat: * Send out of band heartbeat to send emergent request such as new resource requests, container completion etc. before the heartbeat interval indicated by RM. * RM can notify AM when the containers have been allocated so that AM does not have to wait for the scheduled heartbeat to get resources. [~Jim_Brennan] > Add cpu and memory utilization per node and cluster-wide metrics > > > Key: YARN-10450 > URL: https://issues.apache.org/jira/browse/YARN-10450 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3 > > Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, > YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, > YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch > > > Add metrics to show actual cpu and memory utilization for each node and > aggregated for the entire cluster. This is information is already passed > from NM to RM in the node status update. > We have been running with this internally for quite a while and found it > useful to be able to quickly see the actual cpu/memory utilization on the > node/cluster. It's especially useful if some form of overcommit is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316336#comment-17316336 ] chaosju commented on YARN-10450: Why adaptive Heartbeat ? * {color:#FF}Regular heartbeats can overload RM.{color} * {color:#FF}if RM is overloaded things get worse over time as events queue up.{color} * Lower work efficiency as important events at NM/AM need to wait for next heartbeat to let RM know of their status. * Not every heartbeat from a node or AM may be important. If nodes are running full, heartbeats from such nodes would not be useful for application scheduling. * RM should be able to control heartbeats sent to itself How adaptive Heartbeat ? 1.Throttle Heartbeat: * {color:#FF} HB interval based on scheduler load (LIGHT, NORMAL, BUSY, HEAVY){color} * Statistics associated with various scheduler events (processing time vs wait time in queue) is collected. * RM indicates the next HB interval to NM and AM to throttle the heartbeat. 2. Event based Heartbeat: * Send out of band heartbeat to send emergent request such as new resource requests, container completion etc. before the heartbeat interval indicated by RM. * RM can notify AM when the containers have been allocated so that AM does not have to wait for the scheduled heartbeat to get resources. [~Jim_Brennan] > Add cpu and memory utilization per node and cluster-wide metrics > > > Key: YARN-10450 > URL: https://issues.apache.org/jira/browse/YARN-10450 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3 > > Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, > YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, > YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch > > > Add metrics to show actual cpu and memory utilization for each node and > aggregated for the entire cluster. This is information is already passed > from NM to RM in the node status update. > We have been running with this internally for quite a while and found it > useful to be able to quickly see the actual cpu/memory utilization on the > node/cluster. It's especially useful if some form of overcommit is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316309#comment-17316309 ] Peter Bacsko commented on YARN-10564: - Thanks [~gandras] I have the following suggestions: please add comments to the "for" loop which explains this. I don't want to dictate the wording. It could be more sentences. I think it's important. Also, maybe also comment that "supportedWildcardLevel" or MAX_WILDCARD_LEVEL might change in the future (just like me, people might realize that the range is [0-1] and it might make people confused). Also, an overall comment like "collect all template settings based on prefix, then finally apply the collected settings to the newly created queue" might be useful. I'd put it somewhere before the "while" loop, but this is just an idea. > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316288#comment-17316288 ] Andras Gyori commented on YARN-10564: - [~pbacsko] that is it exactly. > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316277#comment-17316277 ] Peter Bacsko edited comment on YARN-10564 at 4/7/21, 12:16 PM: --- Thanks [~gandras], I think I get it. I guess the trick is the "for" loop which modifies "queuePathParts". First we try to find the templates for the parent explicitly, then we step back a wildcard at each iteration. By changing "queuePathParts", the prefix changes so eventually we might find a parent which contains templates. Finally, we call {{setConfigFromTemplateEntries()}} where we set the collected values for the original queue. Is this correct? was (Author: pbacsko): Thanks [~gandras], I think I get it. I guess the trick is the "for" loop which modifies "queuePathParts". First we try to find the templates for the parent explicitly, then we step back each wildcard at a time. By changing "queuePathParts", the prefix changes so eventually we might find a parent which contains templates. Finally, we call {{setConfigFromTemplateEntries()}} where we set the collected values for the original queue. Is this correct? > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316277#comment-17316277 ] Peter Bacsko commented on YARN-10564: - Thanks [~gandras], I think I get it. I guess the trick is the "for" loop which modifies "queuePathParts". First we try to find the templates for the parent explicitly, then we step back each wildcard at a time. By changing "queuePathParts", the prefix changes so eventually we might find a parent which contains templates. Finally, we call {{setConfigFromTemplateEntries()}} where we set the collected values for the original queue. Is this correct? > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316262#comment-17316262 ] Andras Gyori edited comment on YARN-10564 at 4/7/21, 11:52 AM: --- [~pbacsko] thank you for your review. I was afraid that the will be a little bit convoluted. This was much worse in case of legacy auto created queues, but I think wildcards do complicate things in this case as well. Storing the children templates on the Parent itself might be more clear, I will try to investigate that approach. What is happening in this patch: # The template configuration class sets the properties on the queue itself # Since we will support configurable depth of queue creation, we need to support variable length of wildcarding as well (so in case of QUEUE_CREATION_DEPTH=3, we need to support 2 wildcard levels eg. root.a.*.*, thats why we have MAX_WILDCARD_LEVEL) # For a dynamic LeafQueue *root.a.a1,* we start looking for template entries for its parent *root.a* in the configuration and increasing the wildcard level by 1 each iteration. An example for the iteration: ## root.a.auto-queue-creation-v2.template.weight ## root.*.auto-queue-creation-v2.template.weight # After this, we cut the _auto-queue-creation-v2.template_ prefix, and we set the actual value for the configuration. An example: ## from root.a.auto-queue-creation-v2.template.weight -> root.a.a1.weight # If we set the root.a.a1.weight explicitly (this would be used for STOPPING / RUNNING the queue), a template configuration would not overwrite the value. An example: ## root.a.auto-queue-creation-v2.template.weight = 2 ## root.a.a1.weight = 3 ## root.a.a1 will have weight = 3 I hope it does clarify the patch to some extent. was (Author: gandras): [~pbacsko] thank you for your review. I was afraid that the will be a little bit convoluted. This was much worse in case of legacy auto created queues. Storing the children templates on the Parent itself might be more clear, I will try to investigate that approach. What is happening in this patch: # The template configuration class sets the properties on the queue itself # Since we will support configurable depth of queue creation, we need to support variable length of wildcarding as well (so in case of QUEUE_CREATION_DEPTH=3, we need to support 2 wildcard levels eg. root.a.*.*, thats why we have MAX_WILDCARD_LEVEL) # For a dynamic LeafQueue *root.a.a1,* we start looking for template entries for its parent *root.a* in the configuration and increasing the wildcard level by 1 each iteration. An example for the iteration: ## root.a.auto-queue-creation-v2.template.weight ## root.*.auto-queue-creation-v2.template.weight # After this, we cut the _auto-queue-creation-v2.template_ prefix, and we set the actual value for the configuration. An example: ## from root.a.auto-queue-creation-v2.template.weight -> root.a.a1.weight # If we set the root.a.a1.weight explicitly (this would be used for STOPPING / RUNNING the queue), a template configuration would not overwrite the value. An example: ## root.a.auto-queue-creation-v2.template.weight = 2 ## root.a.a1.weight = 3 ## root.a.a1 will have weight = 3 I hope it does clarify the patch to some extent. > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316262#comment-17316262 ] Andras Gyori commented on YARN-10564: - [~pbacsko] thank you for your review. I was afraid that the will be a little bit convoluted. This was much worse in case of legacy auto created queues. Storing the children templates on the Parent itself might be more clear, I will try to investigate that approach. What is happening in this patch: # The template configuration class sets the properties on the queue itself # Since we will support configurable depth of queue creation, we need to support variable length of wildcarding as well (so in case of QUEUE_CREATION_DEPTH=3, we need to support 2 wildcard levels eg. root.a.*.*, thats why we have MAX_WILDCARD_LEVEL) # For a dynamic LeafQueue *root.a.a1,* we start looking for template entries for its parent *root.a* in the configuration and increasing the wildcard level by 1 each iteration. An example for the iteration: ## root.a.auto-queue-creation-v2.template.weight ## root.*.auto-queue-creation-v2.template.weight # After this, we cut the _auto-queue-creation-v2.template_ prefix, and we set the actual value for the configuration. An example: ## from root.a.auto-queue-creation-v2.template.weight -> root.a.a1.weight # If we set the root.a.a1.weight explicitly (this would be used for STOPPING / RUNNING the queue), a template configuration would not overwrite the value. An example: ## root.a.auto-queue-creation-v2.template.weight = 2 ## root.a.a1.weight = 3 ## root.a.a1 will have weight = 3 I hope it does clarify the patch to some extent. > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316213#comment-17316213 ] Peter Bacsko edited comment on YARN-10564 at 4/7/21, 11:51 AM: --- [~gandras] thanks for the patch. >From coding POV it looks ok, this is more like a high level review. There's are some things I just can't figure out (maybe I'm in a bad shape today). 1. Let's say you set the capacity 6w for {{root.a.*}}. Then a dynamic queue {{root.a.newparent.newchild}} get created. How does the weight settings propagate to "newparent" and "newchild"? I kept looking at the code, but it's just not obvious. I can see that "root.a" will have an entry in {{templateEntries}}, but then what? 2. I can't deciper this part: {noformat} for (int i = 0; i <= wildcardLevel; ++i) { queuePathParts.set(queuePathParts.size() - 1 - i, WILDCARD_QUEUE); } {noformat} What's happening here? 3. There is a variable called "supportedWildcardLevel". What is "supported" means in this context? Later on we set it to {{Math.min(queueHierarchyParts - 1, MAX_WILDCARD_LEVEL);}}. It seems to me that it is either 0 or 1, because {{MAX_WILDCARD_LEVEL}} is 1. I assume most of the time it's going to be 1? I don't understand what it is meant to represent. was (Author: pbacsko): [~gandras] thanks for the patch. >From coding POV it looks ok, this is more like a high level review. There's are some things I just can't figure out (maybe I'm in a bad shape today). 1. Let's say you set the capacity 6w for {{root.a.*}}. Then a dynamic queue {{root.a.newparent.newchild}} get created. How does the weight settings propagate to "newparent" and "newchild"? I kept looking at the code, but it's just not obvious. I can see that "root.a" will have an entry in {{templateEntries}}, but then what? 2. I can't deciper this part: {noformat} for (int i = 0; i <= wildcardLevel; ++i) { queuePathParts.set(queuePathParts.size() - 1 - i, WILDCARD_QUEUE); } {noformat} What's happening here? 3. There is a variable called "supportedWildcardLevel". What is "supported" means in this context? Later on we set it to {{Math.min(queueHierarchyParts - 1, MAX_WILDCARD_LEVEL);}}. It seems to me that it is either 0 or 1, because {{MAX_WILDCARD_LEVEL}} is 1. I assume most of the time it's going to be 1? Mentally I don't understand what it is meant to represent. > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316213#comment-17316213 ] Peter Bacsko edited comment on YARN-10564 at 4/7/21, 10:49 AM: --- [~gandras] thanks for the patch. >From coding POV it looks ok, this is more like a high level review. There's are some things I just can't figure out (maybe I'm in a bad shape today). 1. Let's say you set the capacity 6w for {{root.a.*}}. Then a dynamic queue {{root.a.newparent.newchild}} get created. How does the weight settings propagate to "newparent" and "newchild"? I kept looking at the code, but it's just not obvious. I can see that "root.a" will have an entry in {{templateEntries}}, but then what? 2. I can't deciper this part: {noformat} for (int i = 0; i <= wildcardLevel; ++i) { queuePathParts.set(queuePathParts.size() - 1 - i, WILDCARD_QUEUE); } {noformat} What's happening here? 3. There is a variable called "supportedWildcardLevel". What is "supported" means in this context? Later on we set it to {{Math.min(queueHierarchyParts - 1, MAX_WILDCARD_LEVEL);}}. It seems to me that it is either 0 or 1, because {{MAX_WILDCARD_LEVEL}} is 1. I assume most of the time it's going to be 1? Mentally I don't understand what it is meant to represent. was (Author: pbacsko): [~gandras] thanks for the patch. >From coding POV it looks ok, this is more like a high level review. There's are some things I just can't figure out (maybe I'm in a bad shape today). 1. Let's say you set 6w for {{root.a.*}}. Then a dynamic queue {{root.a.newparent.newchild}} get created. How does the weight settings propagate to "newparent" and "newchild"? I kept looking at the code, but it's just not obvious. I can see that "root.a" will have an entry in {{templateEntries}}, but then what? 2. I can't deciper this part: {noformat} for (int i = 0; i <= wildcardLevel; ++i) { queuePathParts.set(queuePathParts.size() - 1 - i, WILDCARD_QUEUE); } {noformat} What's happening here? 3. There is a variable called "supportedWildcardLevel". What is "supported" means in this context? Later on we set it to {{Math.min(queueHierarchyParts - 1, MAX_WILDCARD_LEVEL);}} which seems to be that it is either 0 or 1, because {{MAX_WILDCARD_LEVEL}} is 1. I assume most of the time it's going to be 1? Mentally I don't understand what it is meant to represent. > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316213#comment-17316213 ] Peter Bacsko commented on YARN-10564: - [~gandras] thanks for the patch. >From coding POV it looks ok, this is more like a high level review. There's are some things I just can't figure out (maybe I'm in a bad shape today). 1. Let's say you set 6w for {{root.a.*}}. Then a dynamic queue {{root.a.newparent.newchild}} get created. How does the weight settings propagate to "newparent" and "newchild"? I kept looking at the code, but it's just not obvious. I can see that "root.a" will have an entry in {{templateEntries}}, but then what? 2. I can't deciper this part: {noformat} for (int i = 0; i <= wildcardLevel; ++i) { queuePathParts.set(queuePathParts.size() - 1 - i, WILDCARD_QUEUE); } {noformat} What's happening here? 3. There is a variable called "supportedWildcardLevel". What is "supported" means in this context? Later on we set it to {{Math.min(queueHierarchyParts - 1, MAX_WILDCARD_LEVEL);}} which seems to be that it is either 0 or 1, because {{MAX_WILDCARD_LEVEL}} is 1. I assume most of the time it's going to be 1? Mentally I don't understand what it is meant to represent. > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10714) Remove dangling dynamic queues on reinitialization
[ https://issues.apache.org/jira/browse/YARN-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316182#comment-17316182 ] Szilard Nemeth commented on YARN-10714: --- Thanks [~gandras] for working on this. Latest patch LGTM, committed to trunk. > Remove dangling dynamic queues on reinitialization > -- > > Key: YARN-10714 > URL: https://issues.apache.org/jira/browse/YARN-10714 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10714.001.patch, YARN-10714.002.patch, > YARN-10714.003.patch > > > Current logic does not handle orphaned auto created child queues. The > following example steps show a scenario in which it is possible to submit > applications to an orphaned queue, that has an invalid (already removed) > ParentQueue. > # Auto create a queue root.a.a-auto > # Remove root.a from the config > # Reinitialize CS without restarting it (possible via mutation API) > # Submit application to root.a.a-auto, while root.a is a non-existent queue -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10727) ParentQueue does not validate the queue on removal
[ https://issues.apache.org/jira/browse/YARN-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Gyori updated YARN-10727: Parent: YARN-10496 Issue Type: Sub-task (was: Bug) > ParentQueue does not validate the queue on removal > -- > > Key: YARN-10727 > URL: https://issues.apache.org/jira/browse/YARN-10727 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > > With the addition of YARN-10532 ParentQueue has a public method, removeQueue, > which allows the deletion of a queue at runtime. However, there is no > validation regarding the queue which is to be removed, therefore it is > possible to remove a queue from the CSQueueManager that is not a child of the > ParentQueue. Since it is a public method, there must be validations such as: > * check, if the parent of the queue to be removed is the current ParentQueue > * check, if the parent actually contains the queue in its childQueues > collection -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org