[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305325#comment-17305325 ] Hadoop QA commented on YARN-10704: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 36s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 42s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 52s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 4s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 55s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/828/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 82 unchanged - 0 fixed = 83 total (was 82) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 53s{color} |
[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305298#comment-17305298 ] Qi Zhu commented on YARN-10704: --- Fixed the checkstyle and test in latest patch. :D > The CS effective capacity for absolute mode in UI should support GPU and > other custom resources. > > > Key: YARN-10704 > URL: https://issues.apache.org/jira/browse/YARN-10704 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10704.001.patch, YARN-10704.002.patch, > image-2021-03-19-12-05-28-412.png, image-2021-03-19-12-08-35-273.png > > > Actually there are no information about the effective capacity about GPU in > UI for absolute resource mode. > !image-2021-03-19-12-05-28-412.png|width=873,height=136! > But we have this information in QueueMetrics: > !image-2021-03-19-12-08-35-273.png|width=613,height=268! > > It's very important for our GPU users to use in absolute mode, there still > have nothing to know GPU absolute information in CS Queue UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10704: -- Attachment: YARN-10704.002.patch > The CS effective capacity for absolute mode in UI should support GPU and > other custom resources. > > > Key: YARN-10704 > URL: https://issues.apache.org/jira/browse/YARN-10704 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10704.001.patch, YARN-10704.002.patch, > image-2021-03-19-12-05-28-412.png, image-2021-03-19-12-08-35-273.png > > > Actually there are no information about the effective capacity about GPU in > UI for absolute resource mode. > !image-2021-03-19-12-05-28-412.png|width=873,height=136! > But we have this information in QueueMetrics: > !image-2021-03-19-12-08-35-273.png|width=613,height=268! > > It's very important for our GPU users to use in absolute mode, there still > have nothing to know GPU absolute information in CS Queue UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305290#comment-17305290 ] Qi Zhu commented on YARN-10503: --- Thanks [~epayne] for reply. I also think YARN-9936 is going beyond this requirement, i will try to extend this Jira to enable absolute queue resource conf in a general way for custom resources.:D > Support queue capacity in terms of absolute resources with gpu resourceType. > > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10493) RunC container repository v2
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305211#comment-17305211 ] Hadoop QA commented on YARN-10493: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 1s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} codespell {color} | {color:blue} 0m 1s{color} | | {color:blue} codespell was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 14m 5s{color} | | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 10s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 40s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 7s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 2s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 53s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 34s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 6m 43s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 32s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 25s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 37s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 37s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 17s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 17s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 52s{color} | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2789/2/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt] | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 207 unchanged - 0 fixed = 209 total (was 207) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 32s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 32s{color} | | {color:
[jira] [Commented] (YARN-6538) Inter Queue preemption is not happening when DRF is configured
[ https://issues.apache.org/jira/browse/YARN-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305144#comment-17305144 ] Eric Payne commented on YARN-6538: -- [~novaboy], please provide a specific use case to reproduce this issue. For example, please provide cluster size and applicable queue configuration parameters: number of queues, queue capacities, queue max capacities, queue user limit factors, queue minimum user limit percents, queue ordering policies, preemption parameters for each queue, etc. > Inter Queue preemption is not happening when DRF is configured > -- > > Key: YARN-6538 > URL: https://issues.apache.org/jira/browse/YARN-6538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, scheduler preemption >Affects Versions: 2.8.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Major > > Cluster capacity of . Here memory is more and vcores > are less. If applications have more demand, vcores might be exhausted. > Inter queue preemption ideally has to be kicked in once vcores is over > utilized. However preemption is not happening. > Analysis: > In {{AbstractPreemptableResourceCalculator.computeFixpointAllocation}}, > {code} > // assign all cluster resources until no more demand, or no resources are > // left > while (!orderedByNeed.isEmpty() && Resources.greaterThan(rc, totGuarant, > unassigned, Resources.none())) { > {code} > will loop even when vcores are 0 (because memory is still +ve). Hence we are > having more vcores in idealAssigned which cause no-preemption cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305110#comment-17305110 ] Jim Brennan commented on YARN-10702: Thanks for the suggestions [~gandras]! I agree this should be configurable. I will put up a new patch with those changes. I don't think the new thread has a significant impact. I wasn't trying to measure that, but when I was looking at an RM recently where the dispatcher thread was very busy, the monitoring thread did not appear to be a significant factor, it was popping up as using less than 10% of a single CPU for brief periods of time IIRC. I'll have to take a closer look. But I think making the sampling rate configurable is a good idea. > Add cluster metric for amount of CPU used by RM Event Processor > --- > > Key: YARN-10702 > URL: https://issues.apache.org/jira/browse/YARN-10702 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: Scheduler-Busy.png, YARN-10702.001.patch, > YARN-10702.002.patch, YARN-10702.003.patch, YARN-10702.004.patch, > simon-scheduler-busy.png > > > Add a cluster metric to track the cpu usage of the ResourceManager Event > Processing thread. This lets us know when the critical path of the RM is > running out of headroom. > This feature was originally added for us internally by [~nroberts] and we've > been running with it on production clusters for nearly four years. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10493) RunC container repository v2
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305091#comment-17305091 ] Hadoop QA commented on YARN-10493: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 55s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} codespell {color} | {color:blue} 0m 1s{color} | | {color:blue} codespell was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 14m 23s{color} | | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 11s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 7s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 53s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 44s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 54s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 3m 35s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 50s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 28s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 28s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 46s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 46s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 39s{color} | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2789/1/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt] | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 7 new + 207 unchanged - 0 fixed = 214 total (was 207) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 44s{color} | | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 44s{color} | [/results-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2789/1/artifact/out/results-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0
[jira] [Commented] (YARN-6538) Inter Queue preemption is not happening when DRF is configured
[ https://issues.apache.org/jira/browse/YARN-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305084#comment-17305084 ] Michael Zeoli commented on YARN-6538: - As we transition from Fair Scheduler to Capacity Scheduler, we're running into what we believe is this same issue. We typically assign 1 core to our executors, as our work is typically memory bound and multiple cores per container offer no performance increase. Under Fair Scheduler, preemption worked well for us. Under Capacity, we see situations where jobs are starved for AM's and/or executors when they should otherwise receive their minimum guaranteed capacity via preempted resources from jobs in other queues. While our configuration may be uncommon, it's certainly a valid use case in the grand scheme of YARN and Spark, and this bug seems to create significant issues where they did not exist before (in Fair). > Inter Queue preemption is not happening when DRF is configured > -- > > Key: YARN-6538 > URL: https://issues.apache.org/jira/browse/YARN-6538 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, scheduler preemption >Affects Versions: 2.8.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Major > > Cluster capacity of . Here memory is more and vcores > are less. If applications have more demand, vcores might be exhausted. > Inter queue preemption ideally has to be kicked in once vcores is over > utilized. However preemption is not happening. > Analysis: > In {{AbstractPreemptableResourceCalculator.computeFixpointAllocation}}, > {code} > // assign all cluster resources until no more demand, or no resources are > // left > while (!orderedByNeed.isEmpty() && Resources.greaterThan(rc, totGuarant, > unassigned, Resources.none())) { > {code} > will loop even when vcores are 0 (because memory is still +ve). Hence we are > having more vcores in idealAssigned which cause no-preemption cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305033#comment-17305033 ] Andras Gyori edited comment on YARN-10702 at 3/19/21, 5:10 PM: --- Thank you [~Jim_Brennan] for the contribution! The logic is obviously good as you have tested it thoroughly on a live cluster, therefore I have no addition to that part. However, would not it be worthwhile to make this optional? Did the background thread make any noticeable difference in terms of resource usage? As I see it, this thread is running approximately every second. Making the sampling rate configurable might be useful. was (Author: gandras): Thank you [~Jim_Brennan] for the contribution! The logic is obviously good as you have tested it thoroughly on a live cluster, therefore I have no addition to that part. However, would not it be worthwhile to make this optional? Did the background thread make any noticeable difference in terms of resource usage? As I see it, this thread is running approximately every second, which could be configured as well. > Add cluster metric for amount of CPU used by RM Event Processor > --- > > Key: YARN-10702 > URL: https://issues.apache.org/jira/browse/YARN-10702 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: Scheduler-Busy.png, YARN-10702.001.patch, > YARN-10702.002.patch, YARN-10702.003.patch, YARN-10702.004.patch, > simon-scheduler-busy.png > > > Add a cluster metric to track the cpu usage of the ResourceManager Event > Processing thread. This lets us know when the critical path of the RM is > running out of headroom. > This feature was originally added for us internally by [~nroberts] and we've > been running with it on production clusters for nearly four years. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305033#comment-17305033 ] Andras Gyori commented on YARN-10702: - Thank you [~Jim_Brennan] for the contribution! The logic is obviously good as you have tested it thoroughly on a live cluster, therefore I have no addition to that part. However, would not it be worthwhile to make this optional? Did the background thread make any noticeable difference in terms of resource usage? As I see it, this thread is running approximately every second, which could be configured as well. > Add cluster metric for amount of CPU used by RM Event Processor > --- > > Key: YARN-10702 > URL: https://issues.apache.org/jira/browse/YARN-10702 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: Scheduler-Busy.png, YARN-10702.001.patch, > YARN-10702.002.patch, YARN-10702.003.patch, YARN-10702.004.patch, > simon-scheduler-busy.png > > > Add a cluster metric to track the cpu usage of the ResourceManager Event > Processing thread. This lets us know when the critical path of the RM is > running out of headroom. > This feature was originally added for us internally by [~nroberts] and we've > been running with it on production clusters for nearly four years. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups
[ https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305024#comment-17305024 ] Ahmed Hussein commented on YARN-10597: -- That's interesting. I ran the unit tests in YARN-10425 from intellij and they all passed. Just a quick question. In {{CSMappingPlacementRule.java}} aren't we suppose to pass the configuration object to {{Groups.getUserToGroupsMappingService}} ? I am considering the case when the singleton was not initialized. In that case {{Groups.getUserToGroupsMappingService}} won't parse the parameters {{HADOOP_SECURITY_GROUP_MAPPING}} set inside {{conf}} {code:java} - groups = Groups.getUserToGroupsMappingService(); + groups = Groups.getUserToGroupsMappingService(conf); {code} > CSMappingPlacementRule should not create new instance of Groups > --- > > Key: YARN-10597 > URL: https://issues.apache.org/jira/browse/YARN-10597 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10597.001.patch > > > As [~ahussein] pointed out in YARN-10425, no new Groups instance should be > created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10705) Misleading DEBUG log for container assignment needs to be removed when the container is actually reserved, not assigned in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated YARN-10705: --- Affects Version/s: 3.4.0 > Misleading DEBUG log for container assignment needs to be removed when the > container is actually reserved, not assigned in FairScheduler > > > Key: YARN-10705 > URL: https://issues.apache.org/jira/browse/YARN-10705 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > > Following DEBUG logs are logged if a container reservation is made when a > node has been offered to the queue in FairScheduler: > {code} > 2021-02-10 07:33:55,049 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > application_1610442362681_2607's resource request is reserved. > 2021-02-10 07:33:55,049 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: > Assigned container in queue:root.pj_dc_pe container: > {code} > The latter log from above seems to indicate a bad container assignment with > resource allocation, whereas, in actual, it is a bad > log which shouldn't have been logged in the first place. > This log comes from [1] after an application attempt with an unmet demand is > checked for container assignment/reservation. > If the container for this app attempt is reserved on the node, then, it > returns from [2]. > From [3]: > {quote} >* If an assignment was made, returns the resources allocated to the >* container. If a reservation was made, returns >* FairScheduler.CONTAINER_RESERVED. If no assignment or reservation > was >* made, returns an empty resource. > {quote} > We are checking for the empty resource at [4], but not > FairScheduler.CONTAINER_RESERVED before logging out a message for container > assignment specifically which is incorrect. > Instead of: > {code} > if (!assigned.equals(none())) { > LOG.debug("Assigned container in queue:{} container:{}", > getName(), assigned); > break; > } > {code} > it should be: > {code} > // check if an assignment or a reservation was made. > if (!assigned.equals(none())) { > // only log container assignment if there is > // an actual assignment, not a reservation. > if (!assigned.equals(FairScheduler.CONTAINER_RESERVED) > && LOG.isDebugEnabled()) { > LOG.debug("Assigned container in queue:" + getName() + " " + > "container:" + assigned); > } > break; > } > {code} > [1] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L356 > [2] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L911 > [3] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L842 > [4] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L355 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10705) Misleading DEBUG log for container assignment needs to be removed when the container is actually reserved, not assigned in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja updated YARN-10705: --- Component/s: yarn > Misleading DEBUG log for container assignment needs to be removed when the > container is actually reserved, not assigned in FairScheduler > > > Key: YARN-10705 > URL: https://issues.apache.org/jira/browse/YARN-10705 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > > Following DEBUG logs are logged if a container reservation is made when a > node has been offered to the queue in FairScheduler: > {code} > 2021-02-10 07:33:55,049 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > application_1610442362681_2607's resource request is reserved. > 2021-02-10 07:33:55,049 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: > Assigned container in queue:root.pj_dc_pe container: > {code} > The latter log from above seems to indicate a bad container assignment with > resource allocation, whereas, in actual, it is a bad > log which shouldn't have been logged in the first place. > This log comes from [1] after an application attempt with an unmet demand is > checked for container assignment/reservation. > If the container for this app attempt is reserved on the node, then, it > returns from [2]. > From [3]: > {quote} >* If an assignment was made, returns the resources allocated to the >* container. If a reservation was made, returns >* FairScheduler.CONTAINER_RESERVED. If no assignment or reservation > was >* made, returns an empty resource. > {quote} > We are checking for the empty resource at [4], but not > FairScheduler.CONTAINER_RESERVED before logging out a message for container > assignment specifically which is incorrect. > Instead of: > {code} > if (!assigned.equals(none())) { > LOG.debug("Assigned container in queue:{} container:{}", > getName(), assigned); > break; > } > {code} > it should be: > {code} > // check if an assignment or a reservation was made. > if (!assigned.equals(none())) { > // only log container assignment if there is > // an actual assignment, not a reservation. > if (!assigned.equals(FairScheduler.CONTAINER_RESERVED) > && LOG.isDebugEnabled()) { > LOG.debug("Assigned container in queue:" + getName() + " " + > "container:" + assigned); > } > break; > } > {code} > [1] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L356 > [2] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L911 > [3] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L842 > [4] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L355 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10705) Misleading DEBUG log for container assignment needs to be removed when the container is actually reserved, not assigned in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Ahuja reassigned YARN-10705: -- Assignee: Siddharth Ahuja > Misleading DEBUG log for container assignment needs to be removed when the > container is actually reserved, not assigned in FairScheduler > > > Key: YARN-10705 > URL: https://issues.apache.org/jira/browse/YARN-10705 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Minor > > Following DEBUG logs are logged if a container reservation is made when a > node has been offered to the queue in FairScheduler: > {code} > 2021-02-10 07:33:55,049 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: > application_1610442362681_2607's resource request is reserved. > 2021-02-10 07:33:55,049 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: > Assigned container in queue:root.pj_dc_pe container: > {code} > The latter log from above seems to indicate a bad container assignment with > resource allocation, whereas, in actual, it is a bad > log which shouldn't have been logged in the first place. > This log comes from [1] after an application attempt with an unmet demand is > checked for container assignment/reservation. > If the container for this app attempt is reserved on the node, then, it > returns from [2]. > From [3]: > {quote} >* If an assignment was made, returns the resources allocated to the >* container. If a reservation was made, returns >* FairScheduler.CONTAINER_RESERVED. If no assignment or reservation > was >* made, returns an empty resource. > {quote} > We are checking for the empty resource at [4], but not > FairScheduler.CONTAINER_RESERVED before logging out a message for container > assignment specifically which is incorrect. > Instead of: > {code} > if (!assigned.equals(none())) { > LOG.debug("Assigned container in queue:{} container:{}", > getName(), assigned); > break; > } > {code} > it should be: > {code} > // check if an assignment or a reservation was made. > if (!assigned.equals(none())) { > // only log container assignment if there is > // an actual assignment, not a reservation. > if (!assigned.equals(FairScheduler.CONTAINER_RESERVED) > && LOG.isDebugEnabled()) { > LOG.debug("Assigned container in queue:" + getName() + " " + > "container:" + assigned); > } > break; > } > {code} > [1] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L356 > [2] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L911 > [3] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L842 > [4] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L355 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10705) Misleading DEBUG log for container assignment needs to be removed when the container is actually reserved, not assigned in FairScheduler
Siddharth Ahuja created YARN-10705: -- Summary: Misleading DEBUG log for container assignment needs to be removed when the container is actually reserved, not assigned in FairScheduler Key: YARN-10705 URL: https://issues.apache.org/jira/browse/YARN-10705 Project: Hadoop YARN Issue Type: Bug Reporter: Siddharth Ahuja Following DEBUG logs are logged if a container reservation is made when a node has been offered to the queue in FairScheduler: {code} 2021-02-10 07:33:55,049 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: application_1610442362681_2607's resource request is reserved. 2021-02-10 07:33:55,049 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: Assigned container in queue:root.pj_dc_pe container: {code} The latter log from above seems to indicate a bad container assignment with resource allocation, whereas, in actual, it is a bad log which shouldn't have been logged in the first place. This log comes from [1] after an application attempt with an unmet demand is checked for container assignment/reservation. If the container for this app attempt is reserved on the node, then, it returns from [2]. >From [3]: {quote} * If an assignment was made, returns the resources allocated to the * container. If a reservation was made, returns * FairScheduler.CONTAINER_RESERVED. If no assignment or reservation was * made, returns an empty resource. {quote} We are checking for the empty resource at [4], but not FairScheduler.CONTAINER_RESERVED before logging out a message for container assignment specifically which is incorrect. Instead of: {code} if (!assigned.equals(none())) { LOG.debug("Assigned container in queue:{} container:{}", getName(), assigned); break; } {code} it should be: {code} // check if an assignment or a reservation was made. if (!assigned.equals(none())) { // only log container assignment if there is // an actual assignment, not a reservation. if (!assigned.equals(FairScheduler.CONTAINER_RESERVED) && LOG.isDebugEnabled()) { LOG.debug("Assigned container in queue:" + getName() + " " + "container:" + assigned); } break; } {code} [1] https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L356 [2] https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L911 [3] https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L842 [4] https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L355 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304992#comment-17304992 ] Andras Gyori edited comment on YARN-9927 at 3/19/21, 4:10 PM: -- Thank you [~hcarrot] for raising this issue and [~zhuqi] for stepping up to continue this undertaking. These are my feedback and suggestions, which might not mirror the actual situation, therefore feel free to discuss the false information. I think the current patch is not the best approach of this problem. It does include the concerns already raised in this jira, and my insights as well. The single entry point for an event is AsyncDispatcher#handle, which puts the event in the eventQueue, and is processed asynchronously in a single thread. There is no way this could be circumvented, because it is used as rmContext.getDispatcher() all over the place. We must retain this entry point. However I have a strong sense that the performance bottleneck is actually the AsyncDispatcher#eventQueue (a BlockingQueue). In my opinion, the solution is exactly the suggestion that is already described in the documentation of AsyncDispatcher: {code:java} /** Dispatches {@link Event}s in a separate thread. Currently only single thread does that. Potentially there could be multiple channels for each event type class and a thread pool can be used to dispatch the events. */ {code} My suggestion would be: # Store a new BlockingQueue for each event type in a HashMap # Create a new thread for each of the registered event type / eventQueue # Every thread is responsible for one eventQueue processing # The Dispatcher would map to an N:N:N (EventQueue:Thread:EventHandler) system (or a N:M:N where M is smaller than N in order to reduce the amount of threads), where N is the number of EventTypes registered A more fine-grained solution is possible by making an M*N:M*N:N (EventQueue:Thread:EventHandler) system, where M is a number given on registration (how many thread should be processing this kind of event) and N is the number of EventTypes registered (as far as I am concerned the EventHandlers do not use locks internally, and they are thread safe). I am not sure if this is feasible, because of the external locks used in EventHandlers (eg. NodeEventHandler uses getRMNodes(), which is locked behind a ConcurrentMap -> I think this is the feedback which was given by [~adam.antal] and [~epayne]). A dummy implementation of the aforementioned system would be: {code:java} public class ThreadedDispatcher { private final ConcurrentMap> events; private final ConcurrentMap> eventHandlers; public void register(Class eventType, EventHandler handler) { new Thread(() -> { EventHandler handler = eventHandlers.get(eventType); BlockingQueue eventQueue = events.get(eventType); while (!stopped && !Thread.currentThread().isInterrupted()) { Event event = eventQueue.take(); handler.handle(event); } }).run(); class GenericEventHandler implements EventHandler { public void handle(Event event) { events.get(event.getType()).put(event); } } } {code} This could also be the less disruptive solution, by simply changing the AsyncDispatcher to this ThreadDispatcher and retaining the single entry point of the GenericEventHandler#handle. Ideally, nothing needs to be changed apart from the initialisation of the rmDispatcher. cc: [~pbacsko] was (Author: gandras): Thank you [~hcarrot] for raising this issue and [~zhuqi] for stepping up to continue this undertaking. These are my feedback and suggestions, which might not mirror the actual situation, therefore feel free to discuss the false information. I think the current patch is not the best approach of this problem. It does include the concerns already raised in this jira, and my insights as well. The single entry point for an event is AsyncDispatcher#handle, which puts the event in the eventQueue, and is processed asynchronously in a single thread. There is no way this could be circumvented, because it is used as rmContext.getDispatcher() all over the place. We must retain this entry point. However I have a strong sense that the performance bottleneck is actually the AsyncDispatcher#eventQueue (a BlockingQueue). In my opinion, the solution is exactly the suggestion that is already described in the documentation of AsyncDispatcher: {code:java} /** Dispatches {@link Event}s in a separate thread. Currently only single thread does that. Potentially there could be multiple channels for each event type class and a thread pool can be used to dispatch the events. */ {code} My suggestion would be: # Store a new BlockingQueue for each event type in a HashMap # Create a new thread for each of the registered ev
[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism
[ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304992#comment-17304992 ] Andras Gyori commented on YARN-9927: Thank you [~hcarrot] for raising this issue and [~zhuqi] for stepping up to continue this undertaking. These are my feedback and suggestions, which might not mirror the actual situation, therefore feel free to discuss the false information. I think the current patch is not the best approach of this problem. It does include the concerns already raised in this jira, and my insights as well. The single entry point for an event is AsyncDispatcher#handle, which puts the event in the eventQueue, and is processed asynchronously in a single thread. There is no way this could be circumvented, because it is used as rmContext.getDispatcher() all over the place. We must retain this entry point. However I have a strong sense that the performance bottleneck is actually the AsyncDispatcher#eventQueue (a BlockingQueue). In my opinion, the solution is exactly the suggestion that is already described in the documentation of AsyncDispatcher: {code:java} /** Dispatches {@link Event}s in a separate thread. Currently only single thread does that. Potentially there could be multiple channels for each event type class and a thread pool can be used to dispatch the events. */ {code} My suggestion would be: # Store a new BlockingQueue for each event type in a HashMap # Create a new thread for each of the registered event type / eventQueue # Every thread is responsible for one eventQueue processing # The Dispatcher would map to an N:N:N (EventQueue:Thread:EventHandler) system (or a N:M:N where M is smaller than N in order to reduce the amount of threads), where N is the number of EventTypes registered A more fine-grained solution is possible by making an M*N:M*N:N (EventQueue:Thread:EventHandler) system, where M is a number given on registration (how many thread should be processing this kind of event) and N is the number of EventTypes registered (as far as I am concerned the EventHandlers do not use locks internally, and they are thread safe). I am not sure if this is feasible, because of the external locks used in EventHandlers (eg. NodeEventHandler uses getRMNodes(), which is locked behind a ConcurrentMap -> I think this is the feedback which was given by [~adam.antal] and [~epayne]). A dummy implementation of the aforementioned system would be: {code:java} public class ThreadedDispatcher { private final ConcurrentMap> events; private final ConcurrentMap> eventHandlers; public void register(Class eventType, EventHandler handler) { new Thread(() -> { EventHandler handler = eventHandlers.get(eventType); BlockingQueue eventQueue = events.get(eventType); while (!stopped && !Thread.currentThread().isInterrupted()) { Event event = eventQueue.take(); handler.handle(event); } }).run(); class GenericEventHandler implements EventHandler { public void handle(Event event) { events.get(event.getType()).put(event); } } } {code} cc: [~pbacsko] > RM multi-thread event processing mechanism > -- > > Key: YARN-9927 > URL: https://issues.apache.org/jira/browse/YARN-9927 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0, 2.9.2 >Reporter: hcarrot >Assignee: Qi Zhu >Priority: Major > Attachments: RM multi-thread event processing mechanism.pdf, > YARN-9927.001.patch > > > Recently, we have observed serious event blocking in RM event dispatcher > queue. After analysis of RM event monitoring data and RM event processing > logic, we found that > 1) environment: a cluster with thousands of nodes > 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler > 3) Meanwhile, RM event processing is in a single-thread mode, and It results > in the low headroom of RM event scheduler, thus performance of RM. > So we proposed a RM multi-thread event processing mechanism to improve RM > performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10503) Support queue capacity in terms of absolute resources with gpu resourceType.
[ https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304984#comment-17304984 ] Eric Payne commented on YARN-10503: --- [~leftnoteasy] and [~sunilg], is there a reason custom resources were not included when the absolute resource feature was added? [~zhuqi], I would prefer that custom resources be treated in a generic way for calculating absolute queue resources. I would rather not treat GPU as a special case. However, I think YARN-9936 is going beyond this requirement. Can we use this JIRA (YARN-10503) to extend the absolute queue resource feature in a general way for custom resources? > Support queue capacity in terms of absolute resources with gpu resourceType. > > > Key: YARN-10503 > URL: https://issues.apache.org/jira/browse/YARN-10503 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10503.001.patch, YARN-10503.002.patch > > > Now the absolute resources are memory and cores. > {code:java} > /** > * Different resource types supported. > */ > public enum AbsoluteResourceType { > MEMORY, VCORES; > }{code} > But in our GPU production clusters, we need to support more resourceTypes. > It's very import for cluster scaling when with different resourceType > absolute demands. > > This Jira will handle GPU first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10493) RunC container repository v2
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304979#comment-17304979 ] Matthew Sharp commented on YARN-10493: -- I have an initial PR to address the improvements outlined in the attached pdf. I have some thoughts around the manifest caching that I would like to address in a follow up Jira. We have this running internally with the Java CLI tool from YARN-10494. > RunC container repository v2 > > > Key: YARN-10493 > URL: https://issues.apache.org/jira/browse/YARN-10493 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, yarn >Affects Versions: 3.3.0 >Reporter: Craig Condit >Assignee: Matthew Sharp >Priority: Major > Labels: pull-request-available > Attachments: runc-container-repository-v2-design.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > The current runc container repository design has scalability and usability > issues which will likely limit widespread adoption. We should address this > with a new, V2 layout. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10493) RunC container repository v2
[ https://issues.apache.org/jira/browse/YARN-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-10493: -- Labels: pull-request-available (was: ) > RunC container repository v2 > > > Key: YARN-10493 > URL: https://issues.apache.org/jira/browse/YARN-10493 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, yarn >Affects Versions: 3.3.0 >Reporter: Craig Condit >Assignee: Matthew Sharp >Priority: Major > Labels: pull-request-available > Attachments: runc-container-repository-v2-design.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > The current runc container repository design has scalability and usability > issues which will likely limit widespread adoption. We should address this > with a new, V2 layout. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups
[ https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304971#comment-17304971 ] Peter Bacsko edited comment on YARN-10597 at 3/19/21, 3:35 PM: --- [~shuzirra] is it really that simple? You told me that there were bunch of unit test failures when you tried to change it months back. Anyway it's great news if the change is tiny. was (Author: pbacsko): [~shuzirra] is it really that simple? You told me that there were bunch of unit test failures. Anyway it's great news if the change is tiny. > CSMappingPlacementRule should not create new instance of Groups > --- > > Key: YARN-10597 > URL: https://issues.apache.org/jira/browse/YARN-10597 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10597.001.patch > > > As [~ahussein] pointed out in YARN-10425, no new Groups instance should be > created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups
[ https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304971#comment-17304971 ] Peter Bacsko commented on YARN-10597: - [~shuzirra] is it really that simple? You told me that there were bunch of unit test failures. Anyway it's great news if the change is tiny. > CSMappingPlacementRule should not create new instance of Groups > --- > > Key: YARN-10597 > URL: https://issues.apache.org/jira/browse/YARN-10597 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10597.001.patch > > > As [~ahussein] pointed out in YARN-10425, no new Groups instance should be > created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity
[ https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304941#comment-17304941 ] Jim Brennan commented on YARN-10697: {quote} So can we introduce a new method in Resource.java which can print it in MB|GB|TB? {quote} [~BilwaST] I think that is a good suggestion. There are places where this format would be nice. > Resources are displayed in bytes in UI for schedulers other than capacity > - > > Key: YARN-10697 > URL: https://issues.apache.org/jira/browse/YARN-10697 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10697.001.patch, image-2021-03-17-11-30-57-216.png > > > Resources.newInstance expects MB as memory whereas in MetricsOverviewTable > passes resources in bytes . Also we should display memory in GB for better > readability for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10534) Enable runC container transformations
[ https://issues.apache.org/jira/browse/YARN-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Sharp reassigned YARN-10534: Assignee: Matthew Sharp > Enable runC container transformations > - > > Key: YARN-10534 > URL: https://issues.apache.org/jira/browse/YARN-10534 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Matthew Sharp >Assignee: Matthew Sharp >Priority: Minor > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > The goal of this Jira is to provide an optional plugin to apply runC > container transformations. Enabling runC container transformations will > provide an easy way to apply site specific customizations to all containers. > An example of one transformation that many clusters may need could be a > Kerberos transformation. This would apply cluster Kerberos configurations and > mount them to all runC containers that are submitted, without requiring users > to manage them within their own images. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304816#comment-17304816 ] Hadoop QA commented on YARN-10704: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 20s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 59s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 52s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 57s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 40s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/827/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 82 unchanged - 0 fixed = 85 total (was 82) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 15m 3s{color} | {col
[jira] [Updated] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU and other custom resources.
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10704: -- Summary: The CS effective capacity for absolute mode in UI should support GPU and other custom resources. (was: The CS effective capacity for absolute mode in UI should support GPU.) > The CS effective capacity for absolute mode in UI should support GPU and > other custom resources. > > > Key: YARN-10704 > URL: https://issues.apache.org/jira/browse/YARN-10704 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10704.001.patch, image-2021-03-19-12-05-28-412.png, > image-2021-03-19-12-08-35-273.png > > > Actually there are no information about the effective capacity about GPU in > UI for absolute resource mode. > !image-2021-03-19-12-05-28-412.png|width=873,height=136! > But we have this information in QueueMetrics: > !image-2021-03-19-12-08-35-273.png|width=613,height=268! > > It's very important for our GPU users to use in absolute mode, there still > have nothing to know GPU absolute information in CS Queue UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU.
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304701#comment-17304701 ] Qi Zhu edited comment on YARN-10704 at 3/19/21, 8:00 AM: - cc [~pbacsko] [~gandras] [~ebadger] Could you help review this, i first realize the no label case, because the custom resource metrics still don't support label. I think it's important for custom resource absolute mode users to get the effective custom resource in UI. Thanks. was (Author: zhuqi): cc [~pbacsko] [~gandras] [~ebadger] Could you help review this, i first realize the no label case, because the custom resource metrics still don't support label. > The CS effective capacity for absolute mode in UI should support GPU. > - > > Key: YARN-10704 > URL: https://issues.apache.org/jira/browse/YARN-10704 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10704.001.patch, image-2021-03-19-12-05-28-412.png, > image-2021-03-19-12-08-35-273.png > > > Actually there are no information about the effective capacity about GPU in > UI for absolute resource mode. > !image-2021-03-19-12-05-28-412.png|width=873,height=136! > But we have this information in QueueMetrics: > !image-2021-03-19-12-08-35-273.png|width=613,height=268! > > It's very important for our GPU users to use in absolute mode, there still > have nothing to know GPU absolute information in CS Queue UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU.
[ https://issues.apache.org/jira/browse/YARN-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304701#comment-17304701 ] Qi Zhu commented on YARN-10704: --- cc [~pbacsko] [~gandras] [~ebadger] Could you help review this, i first realize the no label case, because the custom resource metrics still don't support label. > The CS effective capacity for absolute mode in UI should support GPU. > - > > Key: YARN-10704 > URL: https://issues.apache.org/jira/browse/YARN-10704 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10704.001.patch, image-2021-03-19-12-05-28-412.png, > image-2021-03-19-12-08-35-273.png > > > Actually there are no information about the effective capacity about GPU in > UI for absolute resource mode. > !image-2021-03-19-12-05-28-412.png|width=873,height=136! > But we have this information in QueueMetrics: > !image-2021-03-19-12-08-35-273.png|width=613,height=268! > > It's very important for our GPU users to use in absolute mode, there still > have nothing to know GPU absolute information in CS Queue UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org