[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.
[ https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251542#comment-17251542 ] Hadoop QA commented on YARN-10463: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 51s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 34m 31s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 16s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 49s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/397/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router: The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 9s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{
[jira] [Comment Edited] (YARN-10463) For Federation, we should support getApplicationAttemptReport.
[ https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251518#comment-17251518 ] zhuqi edited comment on YARN-10463 at 12/18/20, 6:45 AM: - [~ztang] I rebased it, and triggered a new CI now. Also i add corresponding routerMetrics test to confirm it. Add github PR to trigger CI. Thanks. was (Author: zhuqi): [~ztang] I rebased it, and triggered a new CI now. Also i add corresponding routerMetrics test to confirm it. Thanks. > For Federation, we should support getApplicationAttemptReport. > -- > > Key: YARN-10463 > URL: https://issues.apache.org/jira/browse/YARN-10463 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Labels: pull-request-available > Attachments: YARN-10463.001.patch, YARN-10463.002.patch, > YARN-10463.003.patch, YARN-10463.004.patch, YARN-10463.005.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10463) For Federation, we should support getApplicationAttemptReport.
[ https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-10463: -- Labels: pull-request-available (was: ) > For Federation, we should support getApplicationAttemptReport. > -- > > Key: YARN-10463 > URL: https://issues.apache.org/jira/browse/YARN-10463 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Labels: pull-request-available > Attachments: YARN-10463.001.patch, YARN-10463.002.patch, > YARN-10463.003.patch, YARN-10463.004.patch, YARN-10463.005.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251517#comment-17251517 ] Minni Mittal commented on YARN-10519: - I've addressed the comments for new line and changing visibility in new patch. For UTs, reference in QueueMetrics is required. > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.
[ https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251518#comment-17251518 ] zhuqi commented on YARN-10463: -- [~ztang] I rebased it, and triggered a new CI now. Also i add corresponding routerMetrics test to confirm it. Thanks. > For Federation, we should support getApplicationAttemptReport. > -- > > Key: YARN-10463 > URL: https://issues.apache.org/jira/browse/YARN-10463 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10463.001.patch, YARN-10463.002.patch, > YARN-10463.003.patch, YARN-10463.004.patch, YARN-10463.005.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10519: Attachment: YARN-10519.v5.patch > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.
[ https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251497#comment-17251497 ] Zhankun Tang commented on YARN-10463: - [~zhuqi], I triggered a new CI and it failed. I guess it needs a rebase to the newest trunk. Could you please help to rebase it and trigger the CI again? > For Federation, we should support getApplicationAttemptReport. > -- > > Key: YARN-10463 > URL: https://issues.apache.org/jira/browse/YARN-10463 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10463.001.patch, YARN-10463.002.patch, > YARN-10463.003.patch, YARN-10463.004.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10537) Change type of LogAggregationService threadPool
[ https://issues.apache.org/jira/browse/YARN-10537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Kumar reassigned YARN-10537: -- Assignee: Ankit Kumar > Change type of LogAggregationService threadPool > --- > > Key: YARN-10537 > URL: https://issues.apache.org/jira/browse/YARN-10537 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xie YiFan >Assignee: Ankit Kumar >Priority: Minor > > Now, LogAggregationService threadPool is FixedThreadPool which of default > threadPoolSize is 100. LogAggregationService will construct AppLogAggregator > for new come application and submit to threadPool. AppLogAggregator do while > loop unitl application finished. Some application may run very long time due > to reason such as no enough resource or other. As result, it occupy one > thread of threadPool. When this application size greater than threadPoolSize, > the later short-live application can't upload logs until previous long-live > application finished. So, i think we should replace FixedThreadPool to > CachedThreadPool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.
[ https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251493#comment-17251493 ] Hadoop QA commented on YARN-10463: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 52s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 9m 10s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-mvninstall-root.txt{color} | {color:red} root in trunk failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 21s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt{color} | {color:red} hadoop-yarn-server-router in trunk failed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 16s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt{color} | {color:red} hadoop-yarn-server-router in trunk failed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 10s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/buildtool-branch-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt{color} | {color:orange} The patch fails to run checkstyle in hadoop-yarn-server-router {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 38s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt{color} | {color:red} hadoop-yarn-server-router in trunk failed. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 1m 27s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-shadedclient.txt{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 28s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04.txt{color} | {color:red} hadoop-yarn-server-router in trunk failed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 28s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01.txt{color} | {color:red} hadoop-yarn-server-router in trunk failed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01. {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 51s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 26s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt{color} | {color:red} hadoop-yarn-server-router in trunk failed. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 23s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/395/artifact/out
[jira] [Created] (YARN-10537) Change type of LogAggregationService threadPool
Xie YiFan created YARN-10537: Summary: Change type of LogAggregationService threadPool Key: YARN-10537 URL: https://issues.apache.org/jira/browse/YARN-10537 Project: Hadoop YARN Issue Type: Improvement Reporter: Xie YiFan Now, LogAggregationService threadPool is FixedThreadPool which of default threadPoolSize is 100. LogAggregationService will construct AppLogAggregator for new come application and submit to threadPool. AppLogAggregator do while loop unitl application finished. Some application may run very long time due to reason such as no enough resource or other. As result, it occupy one thread of threadPool. When this application size greater than threadPoolSize, the later short-live application can't upload logs until previous long-live application finished. So, i think we should replace FixedThreadPool to CachedThreadPool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10537) Change type of LogAggregationService threadPool
[ https://issues.apache.org/jira/browse/YARN-10537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xie YiFan updated YARN-10537: - Priority: Minor (was: Major) > Change type of LogAggregationService threadPool > --- > > Key: YARN-10537 > URL: https://issues.apache.org/jira/browse/YARN-10537 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Xie YiFan >Priority: Minor > > Now, LogAggregationService threadPool is FixedThreadPool which of default > threadPoolSize is 100. LogAggregationService will construct AppLogAggregator > for new come application and submit to threadPool. AppLogAggregator do while > loop unitl application finished. Some application may run very long time due > to reason such as no enough resource or other. As result, it occupy one > thread of threadPool. When this application size greater than threadPoolSize, > the later short-live application can't upload logs until previous long-live > application finished. So, i think we should replace FixedThreadPool to > CachedThreadPool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.
[ https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251466#comment-17251466 ] zhuqi commented on YARN-10463: -- [~BilwaST] If you any other advice. [~ztang] will help to merge it. > For Federation, we should support getApplicationAttemptReport. > -- > > Key: YARN-10463 > URL: https://issues.apache.org/jira/browse/YARN-10463 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10463.001.patch, YARN-10463.002.patch, > YARN-10463.003.patch, YARN-10463.004.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10165) Effective Capacities goes beyond 100% when queues are configured with mixed values - Percentage and Absolute Resource
[ https://issues.apache.org/jira/browse/YARN-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251446#comment-17251446 ] zhuqi commented on YARN-10165: -- [~tanu.ajmera] The mixed mode is not supported now, i will fix it later in[YARN-10169 |https://issues.apache.org/jira/browse/YARN-10169] > Effective Capacities goes beyond 100% when queues are configured with mixed > values - Percentage and Absolute Resource > - > > Key: YARN-10165 > URL: https://issues.apache.org/jira/browse/YARN-10165 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Tanu Ajmera >Assignee: Tanu Ajmera >Priority: Major > Attachments: Screenshot 2020-02-26 at 12.39.49 PM.png, Screenshot > 2020-02-26 at 12.40.01 PM.png > > > There are two queues - default and batch whose capacities have been > configured with mixed values. Resource available is 9GB. > Default queue has been configured with Absolute Resource [memory=6000] and > Batch queue has been configured with Capacity Percentage 50%. In the Resource > Manager UI, Effective Capacities goes beyond 100%, for Default queue its > 65.1% and for Batch queue its 50%. > > !Screenshot 2020-02-26 at 12.39.49 PM.png|height=200|width=20! > !Screenshot 2020-02-26 at 12.40.01 PM.png|height=200|width=20! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10169) Mixed absolute resource value and percentage-based resource value in CapacityScheduler should fail
[ https://issues.apache.org/jira/browse/YARN-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251110#comment-17251110 ] zhuqi edited comment on YARN-10169 at 12/18/20, 1:49 AM: - [~leftnoteasy] [~pbacsko], [~snemeth], [~sunilg], [~bteke] When i write a unit test, to confirm that percentage below absolute, the queue will always get wrong resource, 100 of the parent. Becasue of the logic: {code:java} //Set absolute capacities for {capacity, maximum-capacity} private static void updateAbsoluteCapacitiesByNodeLabels( QueueCapacities queueCapacities, QueueCapacities parentQueueCapacities) { for (String label : queueCapacities.getExistingNodeLabels()) { float capacity = queueCapacities.getCapacity(label); if (capacity > 0f) { queueCapacities.setAbsoluteCapacity( label, capacity * (parentQueueCapacities == null ? 1 : parentQueueCapacities .getAbsoluteCapacity(label))); } float maxCapacity = queueCapacities.getMaximumCapacity(label); if (maxCapacity > 0f) { queueCapacities.setAbsoluteMaximumCapacity( label, maxCapacity * (parentQueueCapacities == null ? 1 : parentQueueCapacities .getAbsoluteMaximumCapacity(label))); } } } {code} If we use absolute in parent, and also the capacity use the single absolute resource. Here is the maxCapacity related code: {code:java} public float getNonLabeledQueueMaximumCapacity(String queue) { String configuredCapacity = get(getQueuePrefix(queue) + MAXIMUM_CAPACITY); boolean matcher = (configuredCapacity != null) && RESOURCE_PATTERN.matcher(configuredCapacity).find(); if (matcher) { // Return capacity in percentage as 0 for non-root queues and 100 for // root.From AbstractCSQueue, absolute resource will be parsed and // updated. Once nodes are added/removed in cluster, capacity in // percentage will also be re-calculated. return 100.0f; } float maxCapacity = (configuredCapacity == null) ? MAXIMUM_CAPACITY_VALUE : Float.parseFloat(configuredCapacity); maxCapacity = (maxCapacity == DEFAULT_MAXIMUM_CAPACITY_VALUE) ? MAXIMUM_CAPACITY_VALUE : maxCapacity; return maxCapacity; } {code} In capacity absolute resource mode, it will return 100.0f, then the maxCapacity will be 100. We should change it to support mixed mode,and if we now use mixed mode in maxCapacity, it will not throw exception, but it will be wrong. Also we should fix it in auto create queue. was (Author: zhuqi): [~leftnoteasy] [~pbacsko], [~snemeth], [~sunilg], [~bteke] When i write a unit test, to confirm that percentage below absolute, the queue will always get wrong resource, 100 of the parent. Becasue of the logic: {code:java} //Set absolute capacities for {capacity, maximum-capacity} private static void updateAbsoluteCapacitiesByNodeLabels( QueueCapacities queueCapacities, QueueCapacities parentQueueCapacities) { for (String label : queueCapacities.getExistingNodeLabels()) { float capacity = queueCapacities.getCapacity(label); if (capacity > 0f) { queueCapacities.setAbsoluteCapacity( label, capacity * (parentQueueCapacities == null ? 1 : parentQueueCapacities .getAbsoluteCapacity(label))); } float maxCapacity = queueCapacities.getMaximumCapacity(label); if (maxCapacity > 0f) { queueCapacities.setAbsoluteMaximumCapacity( label, maxCapacity * (parentQueueCapacities == null ? 1 : parentQueueCapacities .getAbsoluteMaximumCapacity(label))); } } } {code} If we use absolute in parent, and also the capacity use the single absolute resource. The parent's parentQueueCapacities will be null, then the value will be maxCapacity, here is the maxCapacity related code: {code:java} public float getNonLabeledQueueMaximumCapacity(String queue) { String configuredCapacity = get(getQueuePrefix(queue) + MAXIMUM_CAPACITY); boolean matcher = (configuredCapacity != null) && RESOURCE_PATTERN.matcher(configuredCapacity).find(); if (matcher) { // Return capacity in percentage as 0 for non-root queues and 100 for // root.From AbstractCSQueue, absolute resource will be parsed and // updated. Once nodes are added/removed in cluster, capacity in // percentage will also be re-calculated. return 100.0f; } float maxCapacity = (configuredCapacity == null) ? MAXIMUM_CAPACITY_VALUE : Float.parseFloat(configuredCapacity); maxCapacity = (maxCapacity == DEFAULT_MAXIMUM_CAPACITY_VALUE) ? MAXIMUM_CAPACITY_VALUE : maxCapacity; return maxCapacity; } {code} In capacity absolute resource mode, it will return 100.0f, then the maxCapacity will be 100. We should change it to support mixed mode,and if we now use mixed mode
[jira] [Resolved] (YARN-10536) Client in distributedShell swallows interrupt exceptions
[ https://issues.apache.org/jira/browse/YARN-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri resolved YARN-10536. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Client in distributedShell swallows interrupt exceptions > > > Key: YARN-10536 > URL: https://issues.apache.org/jira/browse/YARN-10536 > Project: Hadoop YARN > Issue Type: Bug > Components: client, distributed-shell >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > In {{applications.distributedshell.Client}} , the method > {{monitorApplication}} loops waiting for the following conditions: > * Application fails: reaches {{YarnApplicationState.KILLED}}, or > {{YarnApplicationState.FAILED}} > * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or > {{YarnApplicationState.FINISHED}} > * the time spent waiting is longer than {{clientTimeout}} (if it exists in > the parameters). > When the Client thread is interrupted, it ignores the exception: > {code:java} > // Check app status every 1 second. > try { > Thread.sleep(1000); > } catch (InterruptedException e) { > LOG.debug("Thread sleep in monitoring loop interrupted"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10536) Client in distributedShell swallows interrupt exceptions
[ https://issues.apache.org/jira/browse/YARN-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251384#comment-17251384 ] Íñigo Goiri commented on YARN-10536: Thanks [~ahussein] for the fix, merged the PR to trunk. > Client in distributedShell swallows interrupt exceptions > > > Key: YARN-10536 > URL: https://issues.apache.org/jira/browse/YARN-10536 > Project: Hadoop YARN > Issue Type: Bug > Components: client, distributed-shell >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > In {{applications.distributedshell.Client}} , the method > {{monitorApplication}} loops waiting for the following conditions: > * Application fails: reaches {{YarnApplicationState.KILLED}}, or > {{YarnApplicationState.FAILED}} > * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or > {{YarnApplicationState.FINISHED}} > * the time spent waiting is longer than {{clientTimeout}} (if it exists in > the parameters). > When the Client thread is interrupted, it ignores the exception: > {code:java} > // Check app status every 1 second. > try { > Thread.sleep(1000); > } catch (InterruptedException e) { > LOG.debug("Thread sleep in monitoring loop interrupted"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251325#comment-17251325 ] Hadoop QA commented on YARN-10519: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 13m 53s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 21s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 35s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 8s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 45s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 57s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 55s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 16s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 16s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 33s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 33s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green}{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 71 unchanged - 2 fixed = 71 total (was 73) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 57s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 12s{color} | {color:green}
[jira] [Commented] (YARN-10334) TestDistributedShell leaks resources on timeout/failure
[ https://issues.apache.org/jira/browse/YARN-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251298#comment-17251298 ] Ahmed Hussein commented on YARN-10334: -- Those are the steps going to fix the problem * YARN-10536 is going to make the thread responsive in. handling exceptions. * Pass {{timeout}} argument to the {{DistributedShell.Client}}. This timeout has to be smaller than the {{TestDistributedShell.timeout}} rule. * Optional: Client and YarnClient have no interfaces to shutdown/close. Adding such methods to be accessed by the unit tests will be a good addition in order to clean out the code. > TestDistributedShell leaks resources on timeout/failure > --- > > Key: YARN-10334 > URL: https://issues.apache.org/jira/browse/YARN-10334 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell, test, yarn >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: newbie, test > > {{TestDistributedShell}} times out on trunk. I found that the application, > and containers will stay running in the background long after the unit test > has failed. > This causes failure of other test cases and several false positives failures > as result of: > * Ports will stay busy, so other tests cases fail to launch. > * Unit tests fail because of memory restrictions. > Although the unit test is already broken on trunk, we do not want its > failures to other unit tests. > {{TestDistributedShell}} needs to be revisited to make sure that all > {{YarnClients}}, and {{YarnApplications}} are closed properly at the end of > the each unit test (including exception and timeouts) > Steps to reproduce: > {code:bash} > mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers > ## this will timeout as > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 90.234 s <<< FAILURE! - in > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > [ERROR] > testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 90.018 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 9 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] TestDistributedShell.testDSShellWithOpportunisticContainers:1438 » > TestTimedOut > [INFO] > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 > {code} > Using {{ps}} command, you can find the yarn processes are still in the > background > {code:bash} > /bin/bash -c $JRE_HOME/bin/java -Xmx512m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 > --num_containers 2 --priority 0 --appname DistributedShell --homedir > file:/Users/ahussein > 1>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710
[jira] [Commented] (YARN-10499) TestRouterWebServicesREST fails
[ https://issues.apache.org/jira/browse/YARN-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251292#comment-17251292 ] Ahmed Hussein commented on YARN-10499: -- [~aajisaka] .. You are the man :) It feels great to see the failing list down to: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/358/#showFailuresLink {code:bash} Test Result (6 failures / -202) org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks.testSetRepIncWithUnderReplicatedBlocks org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testReadLockCanBeDisabledByConfig org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.testAMSimulatorWithNodeLabels[1] org.apache.hadoop.tools.dynamometer.TestDynamometerInfra.org.apache.hadoop.tools.dynamometer.TestDynamometerInfra org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType {code} > TestRouterWebServicesREST fails > --- > > Key: YARN-10499 > URL: https://issues.apache.org/jira/browse/YARN-10499 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: > patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt > > Time Spent: 1h > Remaining Estimate: 0h > > [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2488/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn.txt] > {noformat} > [ERROR] Failures: > [ERROR] > TestRouterWebServicesREST.testAppAttemptXML:720->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] > TestRouterWebServicesREST.testAppPriorityXML:796->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] TestRouterWebServicesREST.testAppQueueXML:846->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] TestRouterWebServicesREST.testAppStateXML:744->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] > TestRouterWebServicesREST.testAppTimeoutXML:920->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] > TestRouterWebServicesREST.testAppTimeoutsXML:896->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] TestRouterWebServicesREST.testAppXML:696->performGetCalls:274 > expected:<200> but was:<204> > [ERROR] TestRouterWebServicesREST.testUpdateAppPriorityXML:832 > expected:<200> but was:<500> > [ERROR] TestRouterWebServicesREST.testUpdateAppQueueXML:882 expected:<200> > but was:<500> > [ERROR] TestRouterWebServicesREST.testUpdateAppStateXML:782 expected:<202> > but was:<500> > [ERROR] Errors: > [ERROR] > TestRouterWebServicesREST.testGetAppAttemptXML:1292->getAppAttempt:1464 » > ClientHandler > [ERROR] > TestRouterWebServicesREST.testGetAppsMultiThread:1337->testGetContainersXML:1317->getAppAttempt:1464 > » ClientHandler > [ERROR] > TestRouterWebServicesREST.testGetContainersXML:1317->getAppAttempt:1464 » > ClientHandler {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10536) Client in distributedShell swallows interrupt exceptions
[ https://issues.apache.org/jira/browse/YARN-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251173#comment-17251173 ] Ahmed Hussein commented on YARN-10536: -- [~ayushsaxena], [~inigoiri], [~epayne] Can you please take a look at that small change? After it is gets merged I will work on YARN-10536 to reduce the overhead of running those tests. > Client in distributedShell swallows interrupt exceptions > > > Key: YARN-10536 > URL: https://issues.apache.org/jira/browse/YARN-10536 > Project: Hadoop YARN > Issue Type: Bug > Components: client, distributed-shell >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In {{applications.distributedshell.Client}} , the method > {{monitorApplication}} loops waiting for the following conditions: > * Application fails: reaches {{YarnApplicationState.KILLED}}, or > {{YarnApplicationState.FAILED}} > * Application succeeds: {{FinalApplicationStatus.SUCCEEDED}} or > {{YarnApplicationState.FINISHED}} > * the time spent waiting is longer than {{clientTimeout}} (if it exists in > the parameters). > When the Client thread is interrupted, it ignores the exception: > {code:java} > // Check app status every 1 second. > try { > Thread.sleep(1000); > } catch (InterruptedException e) { > LOG.debug("Thread sleep in monitoring loop interrupted"); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10169) Mixed absolute resource value and percentage-based resource value in CapacityScheduler should fail
[ https://issues.apache.org/jira/browse/YARN-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251110#comment-17251110 ] zhuqi edited comment on YARN-10169 at 12/17/20, 2:00 PM: - [~leftnoteasy] [~pbacsko], [~snemeth], [~sunilg], [~bteke] When i write a unit test, to confirm that percentage below absolute, the queue will always get wrong resource, 100 of the parent. Becasue of the logic: {code:java} //Set absolute capacities for {capacity, maximum-capacity} private static void updateAbsoluteCapacitiesByNodeLabels( QueueCapacities queueCapacities, QueueCapacities parentQueueCapacities) { for (String label : queueCapacities.getExistingNodeLabels()) { float capacity = queueCapacities.getCapacity(label); if (capacity > 0f) { queueCapacities.setAbsoluteCapacity( label, capacity * (parentQueueCapacities == null ? 1 : parentQueueCapacities .getAbsoluteCapacity(label))); } float maxCapacity = queueCapacities.getMaximumCapacity(label); if (maxCapacity > 0f) { queueCapacities.setAbsoluteMaximumCapacity( label, maxCapacity * (parentQueueCapacities == null ? 1 : parentQueueCapacities .getAbsoluteMaximumCapacity(label))); } } } {code} If we use absolute in parent, and also the capacity use the single absolute resource. The parent's parentQueueCapacities will be null, then the value will be maxCapacity, here is the maxCapacity related code: {code:java} public float getNonLabeledQueueMaximumCapacity(String queue) { String configuredCapacity = get(getQueuePrefix(queue) + MAXIMUM_CAPACITY); boolean matcher = (configuredCapacity != null) && RESOURCE_PATTERN.matcher(configuredCapacity).find(); if (matcher) { // Return capacity in percentage as 0 for non-root queues and 100 for // root.From AbstractCSQueue, absolute resource will be parsed and // updated. Once nodes are added/removed in cluster, capacity in // percentage will also be re-calculated. return 100.0f; } float maxCapacity = (configuredCapacity == null) ? MAXIMUM_CAPACITY_VALUE : Float.parseFloat(configuredCapacity); maxCapacity = (maxCapacity == DEFAULT_MAXIMUM_CAPACITY_VALUE) ? MAXIMUM_CAPACITY_VALUE : maxCapacity; return maxCapacity; } {code} In capacity absolute resource mode, it will return 100.0f, then the maxCapacity will be 100. We should change it to support mixed mode,and if we now use mixed mode in maxCapacity, it will not throw exception, but it will be wrong. Also we should fix it in auto create queue. was (Author: zhuqi): [~leftnoteasy] When i write a unit test, to confirm that percentage below absolute, the queue will always get wrong resource, 100 of the parent. Becasue of the logic: {code:java} //Set absolute capacities for {capacity, maximum-capacity} private static void updateAbsoluteCapacitiesByNodeLabels( QueueCapacities queueCapacities, QueueCapacities parentQueueCapacities) { for (String label : queueCapacities.getExistingNodeLabels()) { float capacity = queueCapacities.getCapacity(label); if (capacity > 0f) { queueCapacities.setAbsoluteCapacity( label, capacity * (parentQueueCapacities == null ? 1 : parentQueueCapacities .getAbsoluteCapacity(label))); } float maxCapacity = queueCapacities.getMaximumCapacity(label); if (maxCapacity > 0f) { queueCapacities.setAbsoluteMaximumCapacity( label, maxCapacity * (parentQueueCapacities == null ? 1 : parentQueueCapacities .getAbsoluteMaximumCapacity(label))); } } } {code} If we use absolute in parent, and also the capacity use the single absolute resource. The parent's parentQueueCapacities will be null, then the value will be maxCapacity, here is the maxCapacity related code: {code:java} public float getNonLabeledQueueMaximumCapacity(String queue) { String configuredCapacity = get(getQueuePrefix(queue) + MAXIMUM_CAPACITY); boolean matcher = (configuredCapacity != null) && RESOURCE_PATTERN.matcher(configuredCapacity).find(); if (matcher) { // Return capacity in percentage as 0 for non-root queues and 100 for // root.From AbstractCSQueue, absolute resource will be parsed and // updated. Once nodes are added/removed in cluster, capacity in // percentage will also be re-calculated. return 100.0f; } float maxCapacity = (configuredCapacity == null) ? MAXIMUM_CAPACITY_VALUE : Float.parseFloat(configuredCapacity); maxCapacity = (maxCapacity == DEFAULT_MAXIMUM_CAPACITY_VALUE) ? MAXIMUM_CAPACITY_VALUE : maxCapacity; return maxCapacity; } {code} In capacity absolute resource mode, it will return 100.0f, then the maxCapacity will be 100. We should change it to supp
[jira] [Commented] (YARN-10169) Mixed absolute resource value and percentage-based resource value in CapacityScheduler should fail
[ https://issues.apache.org/jira/browse/YARN-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251110#comment-17251110 ] zhuqi commented on YARN-10169: -- [~leftnoteasy] When i write a unit test, to confirm that percentage below absolute, the queue will always get wrong resource, 100 of the parent. Becasue of the logic: {code:java} //Set absolute capacities for {capacity, maximum-capacity} private static void updateAbsoluteCapacitiesByNodeLabels( QueueCapacities queueCapacities, QueueCapacities parentQueueCapacities) { for (String label : queueCapacities.getExistingNodeLabels()) { float capacity = queueCapacities.getCapacity(label); if (capacity > 0f) { queueCapacities.setAbsoluteCapacity( label, capacity * (parentQueueCapacities == null ? 1 : parentQueueCapacities .getAbsoluteCapacity(label))); } float maxCapacity = queueCapacities.getMaximumCapacity(label); if (maxCapacity > 0f) { queueCapacities.setAbsoluteMaximumCapacity( label, maxCapacity * (parentQueueCapacities == null ? 1 : parentQueueCapacities .getAbsoluteMaximumCapacity(label))); } } } {code} If we use absolute in parent, and also the capacity use the single absolute resource. The parent's parentQueueCapacities will be null, then the value will be maxCapacity, here is the maxCapacity related code: {code:java} public float getNonLabeledQueueMaximumCapacity(String queue) { String configuredCapacity = get(getQueuePrefix(queue) + MAXIMUM_CAPACITY); boolean matcher = (configuredCapacity != null) && RESOURCE_PATTERN.matcher(configuredCapacity).find(); if (matcher) { // Return capacity in percentage as 0 for non-root queues and 100 for // root.From AbstractCSQueue, absolute resource will be parsed and // updated. Once nodes are added/removed in cluster, capacity in // percentage will also be re-calculated. return 100.0f; } float maxCapacity = (configuredCapacity == null) ? MAXIMUM_CAPACITY_VALUE : Float.parseFloat(configuredCapacity); maxCapacity = (maxCapacity == DEFAULT_MAXIMUM_CAPACITY_VALUE) ? MAXIMUM_CAPACITY_VALUE : maxCapacity; return maxCapacity; } {code} In capacity absolute resource mode, it will return 100.0f, then the maxCapacity will be 100. We should change it to support mixed mode,and if we now use mixed mode in maxCapacity, it will not throw exception, but it will be wrong. > Mixed absolute resource value and percentage-based resource value in > CapacityScheduler should fail > -- > > Key: YARN-10169 > URL: https://issues.apache.org/jira/browse/YARN-10169 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: zhuqi >Priority: Blocker > Attachments: YARN-10169.001.patch, YARN-10169.002.patch, > YARN-10169.003.patch > > > To me this is a bug: if there's a queue has capacity set to float, and > maximum-capacity set to absolute value. Existing logic allows the behavior. > For example: > {code:java} > queue.capacity = 0.8 > queue.maximum-capacity = [mem=x, vcore=y] {code} > We should throw exception when configured like this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10528) maxAMShare should only be accepted for leaf queues, not parent queues
[ https://issues.apache.org/jira/browse/YARN-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251020#comment-17251020 ] Siddharth Ahuja commented on YARN-10528: Thank you [~snemeth]! Please take your time. > maxAMShare should only be accepted for leaf queues, not parent queues > - > > Key: YARN-10528 > URL: https://issues.apache.org/jira/browse/YARN-10528 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Major > Attachments: YARN-10528.001.patch, maxAMShare for root.users (parent > queue) has no effect as child queue does not inherit it.png > > > Based on [Hadoop > documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html], > it is clear that {{maxAMShare}} property can only be used for *leaf queues*. > This is similar to the {{reservation}} setting. > However, existing code only ensures that the reservation setting is not > accepted for "parent" queues (see > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L226 > and > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L233) > but it is missing the checks for {{maxAMShare}}. Due to this, it is > currently possible to have an allocation similar to below: > {code} > > > > 1.0 > drf > * > * > > 1.0 > drf > > > 1.0 > drf > 1.0 > > > fair > > > > > > > > > {code} > where {{maxAMShare}} is 1.0f meaning, it is possible allocate 100% of the > queue's resources for Application Masters. Notice above that root.users is a > parent queue, however, it still gladly accepts {{maxAMShare}}. This is > contrary to the documentation and in fact, it is very misleading because the > child queues like root.users. actually do not inherit this setting at > all and they still go on and use the default of 0.5 instead of 1.0, see the > attached screenshot as an example. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10528) maxAMShare should only be accepted for leaf queues, not parent queues
[ https://issues.apache.org/jira/browse/YARN-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250993#comment-17250993 ] Szilard Nemeth commented on YARN-10528: --- Thanks [~sahuja], Very detailed description and testing steps. Will take a look at your patch soon. > maxAMShare should only be accepted for leaf queues, not parent queues > - > > Key: YARN-10528 > URL: https://issues.apache.org/jira/browse/YARN-10528 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siddharth Ahuja >Assignee: Siddharth Ahuja >Priority: Major > Attachments: YARN-10528.001.patch, maxAMShare for root.users (parent > queue) has no effect as child queue does not inherit it.png > > > Based on [Hadoop > documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html], > it is clear that {{maxAMShare}} property can only be used for *leaf queues*. > This is similar to the {{reservation}} setting. > However, existing code only ensures that the reservation setting is not > accepted for "parent" queues (see > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L226 > and > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L233) > but it is missing the checks for {{maxAMShare}}. Due to this, it is > currently possible to have an allocation similar to below: > {code} > > > > 1.0 > drf > * > * > > 1.0 > drf > > > 1.0 > drf > 1.0 > > > fair > > > > > > > > > {code} > where {{maxAMShare}} is 1.0f meaning, it is possible allocate 100% of the > queue's resources for Application Masters. Notice above that root.users is a > parent queue, however, it still gladly accepts {{maxAMShare}}. This is > contrary to the documentation and in fact, it is very misleading because the > child queues like root.users. actually do not inherit this setting at > all and they still go on and use the default of 0.5 instead of 1.0, see the > attached screenshot as an example. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org