[jira] [Updated] (YARN-10623) Capacity scheduler should support refresh queue automatically by a thread policy.

2021-02-10 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10623:
--
Description: 
In fair scheduler, it is supported that refresh queue related conf 
automatically by a thread to reload, but in capacity scheduler we only support 
to refresh queue related changes by refreshQueues, it is needed for our cluster 
to realize queue manage.

cc [~wangda] [~ztang] [~pbacsko] [~snemeth] [~gandras]  [~bteke] [~shuzirra]

  was:
In fair scheduler, it is supported that refresh queue related conf 
automatically by a thread to reload, but in capacity scheduler we only support 
to refresh queue related changes by refreshQueues, it is needed for our cluster 
to realize queue manage.

cc [~wangda] [~pbacsko] [~snemeth] [~gandras]  [~bteke] [~shuzirra]


> Capacity scheduler should support refresh queue automatically by a thread 
> policy.
> -
>
> Key: YARN-10623
> URL: https://issues.apache.org/jira/browse/YARN-10623
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> In fair scheduler, it is supported that refresh queue related conf 
> automatically by a thread to reload, but in capacity scheduler we only 
> support to refresh queue related changes by refreshQueues, it is needed for 
> our cluster to realize queue manage.
> cc [~wangda] [~ztang] [~pbacsko] [~snemeth] [~gandras]  [~bteke] [~shuzirra]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10623) Capacity scheduler should support refresh queue automatically by a thread policy.

2021-02-10 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10623:
--
Summary: Capacity scheduler should support refresh queue automatically by a 
thread policy.  (was: Capacity scheduler should support refresh queue 
automatically.)

> Capacity scheduler should support refresh queue automatically by a thread 
> policy.
> -
>
> Key: YARN-10623
> URL: https://issues.apache.org/jira/browse/YARN-10623
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>
> In fair scheduler, it is supported that refresh queue related conf 
> automatically by a thread to reload, but in capacity scheduler we only 
> support to refresh queue related changes by refreshQueues, it is needed for 
> our cluster to realize queue manage.
> cc [~wangda] [~pbacsko] [~snemeth] [~gandras]  [~bteke] [~shuzirra]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10623) Capacity scheduler should support refresh queue automatically.

2021-02-10 Thread Qi Zhu (Jira)
Qi Zhu created YARN-10623:
-

 Summary: Capacity scheduler should support refresh queue 
automatically.
 Key: YARN-10623
 URL: https://issues.apache.org/jira/browse/YARN-10623
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Reporter: Qi Zhu
Assignee: Qi Zhu


In fair scheduler, it is supported that refresh queue related conf 
automatically by a thread to reload, but in capacity scheduler we only support 
to refresh queue related changes by refreshQueues, it is needed for our cluster 
to realize queue manage.

cc [~wangda] [~pbacsko] [~snemeth] [~gandras]  [~bteke] [~shuzirra]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10588) Percentage of queue and cluster is zero in WebUI

2021-02-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282737#comment-17282737
 ] 

Eric Payne commented on YARN-10588:
---

I see. Thanks [~BilwaST] for the explanation. After looking at the code and 
talking it over with [~Jim_Brennan], it does look like a better solution would 
be to modify {{DominantResourceCalculator#isInvalidDivisor}} so that its 
behavior matches the logic of {{DominantResourceCalculator#divide".

> Percentage of queue and cluster is zero in WebUI 
> -
>
> Key: YARN-10588
> URL: https://issues.apache.org/jira/browse/YARN-10588
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10588.001.patch, YARN-10588.002.patch, 
> YARN-10588.003.patch
>
>
> Steps to reproduce:
> Configure below property in resource-types.xml
> {code:java}
> 
>  yarn.resource-types
>  yarn.io/gpu
>  {code}
> Submit a job
> In UI you can see % Of Queue and % Of Cluster is zero for the submitted 
> application
>  
> This is because in SchedulerApplicationAttempt has below check for 
> calculating queueUsagePerc and clusterUsagePerc
> {code:java}
> if (!calc.isInvalidDivisor(cluster)) {
> float queueCapacityPerc = queue.getQueueInfo(false, false)
> .getCapacity();
> queueUsagePerc = calc.divide(cluster, usedResourceClone,
> Resources.multiply(cluster, queueCapacityPerc)) * 100;
> if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) {
>   queueUsagePerc = 0.0f;
> }
> clusterUsagePerc =
> calc.divide(cluster, usedResourceClone, cluster) * 100;
>   }
> {code}
> calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10500) TestDelegationTokenRenewer fails intermittently

2021-02-10 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282670#comment-17282670
 ] 

Jim Brennan commented on YARN-10500:


Actually I missed noticing there were some check-style issues.   [~iwasakims]. 
can you please fix those?  And while you are at it, there is an unneeded 
{{throws Exception}} on {{testShutdown()}}.  Can you remove that as well?


> TestDelegationTokenRenewer fails intermittently
> ---
>
> Key: YARN-10500
> URL: https://issues.apache.org/jira/browse/YARN-10500
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: flaky-test, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TestDelegationTokenRenewer sometimes timeouts.
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/334/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
> {noformat}
> [INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
> [ERROR] Tests run: 23, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 83.675 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
> [ERROR] 
> testTokenThreadTimeout(org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer)
>   Time elapsed: 30.065 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:394)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer.testTokenThreadTimeout(TestDelegationTokenRenewer.java:1769)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10500) TestDelegationTokenRenewer fails intermittently

2021-02-10 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282664#comment-17282664
 ] 

Jim Brennan commented on YARN-10500:


+1 Thanks for fixing this [~iwasakims]!   I will commit shortly.


> TestDelegationTokenRenewer fails intermittently
> ---
>
> Key: YARN-10500
> URL: https://issues.apache.org/jira/browse/YARN-10500
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: flaky-test, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TestDelegationTokenRenewer sometimes timeouts.
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/334/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
> {noformat}
> [INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
> [ERROR] Tests run: 23, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 83.675 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
> [ERROR] 
> testTokenThreadTimeout(org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer)
>   Time elapsed: 30.065 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:394)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer.testTokenThreadTimeout(TestDelegationTokenRenewer.java:1769)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10618) RM UI2 Application page shows the AM preempted containers instead of the nonAM ones

2021-02-10 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282642#comment-17282642
 ] 

Gergely Pollak commented on YARN-10618:
---

[~bteke] thank you for the patch it is quite straightforward LGTM+1 
(non-binding).

> RM UI2 Application page shows the AM preempted containers instead of the 
> nonAM ones
> ---
>
> Key: YARN-10618
> URL: https://issues.apache.org/jira/browse/YARN-10618
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Minor
> Attachments: YARN-10618.001.patch
>
>
> YARN RM UIv2 application page shows the AM preempted containers under both 
> the _Num Non-AM container preempted_ and _Num AM container preempted_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-02-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282594#comment-17282594
 ] 

Hadoop QA commented on YARN-9615:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
15s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
43s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
47s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
13s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
38s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 33s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
56s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
13s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m  
9s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; 
considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
22s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
36s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
36s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
11s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
11s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 35s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/611/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color}
 | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 17 new + 
53 unchanged - 0 fixed = 70 total (was 53) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace 

[jira] [Assigned] (YARN-9927) RM multi-thread event processing mechanism

2021-02-10 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned YARN-9927:
---

Assignee: Bilwa S T

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Bilwa S T
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10622) Fix preemption policy to exclude childless ParentQueues

2021-02-10 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282473#comment-17282473
 ] 

Qi Zhu commented on YARN-10622:
---

Thanks [~gandras] for good finding.

 

> Fix preemption policy to exclude childless ParentQueues
> ---
>
> Key: YARN-10622
> URL: https://issues.apache.org/jira/browse/YARN-10622
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>
> ProportionalCapacityPreemptionPolicy selects the potential LeafQueues to be 
> preempted by this logic:
> {code:java}
> private Set getLeafQueueNames(TempQueuePerPartition q) {
> // If its a ManagedParentQueue, it might not have any children
> if ((q.children == null || q.children.isEmpty())
> && !(q.parentQueue instanceof ManagedParentQueue)) {
>   return ImmutableSet.of(q.queueName);
> }
> Set leafQueueNames = new HashSet<>();
> for (TempQueuePerPartition child : q.children) {
>   leafQueueNames.addAll(getLeafQueueNames(child));
> }
> return leafQueueNames;
>   }
> {code}
> This, however does not take childless ParentQueues (which was introduced in 
> YARN-10596) into account. 
> A childless ParentQueue will throw a NPE in 
> FifoCandidatesSelector#selectCandidates:
> {code:java}
> LeafQueue leafQueue = preemptionContext.getQueueByPartition(queueName,
>   RMNodeLabelsManager.NO_LABEL).leafQueue;
> {code}
> TempQueuePerPartition has a leafQueue member variable, which is null, if the 
> queue is not a LeafQueue. In case of childless ParentQueue, it is null, but 
> its name is present in the leafQueueNames as stated before.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10546) Limit application resource reservation on nodes for non-node/rack specific requests shoud be supported in CS.

2021-02-10 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282467#comment-17282467
 ] 

Qi Zhu commented on YARN-10546:
---

cc [~wangda] [~ztang] [~epayne]  [~Jim_Brennan] [~ebadger]

Could you take a look at this?

Thanks.

> Limit application resource reservation on nodes for non-node/rack specific 
> requests shoud be supported in CS.
> -
>
> Key: YARN-10546
> URL: https://issues.apache.org/jira/browse/YARN-10546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Just as fixed in YARN-4270  about FairScheduler.
> The capacityScheduler should also fixed it.
> It is a big problem in production cluster, when it happended.
> Also we should support fs convert to cs to support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2021-02-10 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282458#comment-17282458
 ] 

Qi Zhu commented on YARN-9615:
--

[~jhung] [~bibinchundatt] 
I want to take it, now attached a patch for review, i will add test later.

Thanks.

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-9615.001.patch, YARN-9615.poc.patch, 
> screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9615) Add dispatcher metrics to RM

2021-02-10 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu reassigned YARN-9615:


Assignee: Qi Zhu  (was: Jonathan Hung)

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-9615.001.patch, YARN-9615.poc.patch, 
> screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM

2021-02-10 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9615:
-
Attachment: (was: YARN-9618.001.patch)

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9615.001.patch, YARN-9615.poc.patch, 
> screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM

2021-02-10 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9615:
-
Attachment: YARN-9615.001.patch

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9615.001.patch, YARN-9615.poc.patch, 
> screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM

2021-02-10 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-9615:
-
Attachment: YARN-9618.001.patch

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9615.001.patch, YARN-9615.poc.patch, 
> screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10622) Fix preemption policy to exclude childless ParentQueues

2021-02-10 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10622:

Summary: Fix preemption policy to exclude childless ParentQueues  (was: Fix 
preemption policy to exclude childless ParentQueus)

> Fix preemption policy to exclude childless ParentQueues
> ---
>
> Key: YARN-10622
> URL: https://issues.apache.org/jira/browse/YARN-10622
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>
> ProportionalCapacityPreemptionPolicy selects the potential LeafQueues to be 
> preempted by this logic:
> {code:java}
> private Set getLeafQueueNames(TempQueuePerPartition q) {
> // If its a ManagedParentQueue, it might not have any children
> if ((q.children == null || q.children.isEmpty())
> && !(q.parentQueue instanceof ManagedParentQueue)) {
>   return ImmutableSet.of(q.queueName);
> }
> Set leafQueueNames = new HashSet<>();
> for (TempQueuePerPartition child : q.children) {
>   leafQueueNames.addAll(getLeafQueueNames(child));
> }
> return leafQueueNames;
>   }
> {code}
> This, however does not take childless ParentQueues (which was introduced in 
> YARN-10596) into account. 
> A childless ParentQueue will throw a NPE in 
> FifoCandidatesSelector#selectCandidates:
> {code:java}
> LeafQueue leafQueue = preemptionContext.getQueueByPartition(queueName,
>   RMNodeLabelsManager.NO_LABEL).leafQueue;
> {code}
> TempQueuePerPartition has a leafQueue member variable, which is null, if the 
> queue is not a LeafQueue. In case of childless ParentQueue, it is null, but 
> its name is present in the leafQueueNames as stated before.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10622) Fix preemption policy to exclude childless ParentQueus

2021-02-10 Thread Andras Gyori (Jira)
Andras Gyori created YARN-10622:
---

 Summary: Fix preemption policy to exclude childless ParentQueus
 Key: YARN-10622
 URL: https://issues.apache.org/jira/browse/YARN-10622
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Andras Gyori
Assignee: Andras Gyori


ProportionalCapacityPreemptionPolicy selects the potential LeafQueues to be 
preempted by this logic:
{code:java}
private Set getLeafQueueNames(TempQueuePerPartition q) {
// If its a ManagedParentQueue, it might not have any children
if ((q.children == null || q.children.isEmpty())
&& !(q.parentQueue instanceof ManagedParentQueue)) {
  return ImmutableSet.of(q.queueName);
}

Set leafQueueNames = new HashSet<>();
for (TempQueuePerPartition child : q.children) {
  leafQueueNames.addAll(getLeafQueueNames(child));
}

return leafQueueNames;
  }
{code}
This, however does not take childless ParentQueues (which was introduced in 
YARN-10596) into account. 

A childless ParentQueue will throw a NPE in 
FifoCandidatesSelector#selectCandidates:

{code:java}
LeafQueue leafQueue = preemptionContext.getQueueByPartition(queueName,
  RMNodeLabelsManager.NO_LABEL).leafQueue;
{code}
TempQueuePerPartition has a leafQueue member variable, which is null, if the 
queue is not a LeafQueue. In case of childless ParentQueue, it is null, but its 
name is present in the leafQueueNames as stated before.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion

2021-02-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282426#comment-17282426
 ] 

Hadoop QA commented on YARN-10620:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 22m 
12s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
12s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 42s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
50s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 52s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} the 

[jira] [Commented] (YARN-10593) Fix incorrect string comparison in GpuDiscoverer

2021-02-10 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282424#comment-17282424
 ] 

Szilard Nemeth commented on YARN-10593:
---

Thanks [~pbacsko] for working on this.
Patch LGTM, committed to trunk.
Thanks [~zhuqi] for the review.

> Fix incorrect string comparison in GpuDiscoverer
> 
>
> Key: YARN-10593
> URL: https://issues.apache.org/jira/browse/YARN-10593
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10593-001.patch
>
>
> The following comparison in {{GpuDiscoverer}} is invalid:
> {noformat}
>    binaryPath = configuredBinaryFile;
>   // If path exists but file name is incorrect don't execute the file
>   String fileName = binaryPath.getName();
>   if (DEFAULT_BINARY_NAME.equals(fileName)) {  <--- inverse condition 
> needed
> String msg = String.format("Please check the configuration value of"
>  +" %s. It should point to an %s binary.",
>  YarnConfiguration.NM_GPU_PATH_TO_EXEC,
>  DEFAULT_BINARY_NAME);
> throwIfNecessary(new YarnException(msg), config);
> LOG.warn(msg);
>   }{noformat}
> Obviously it should be other way around - we should log a warning or throw an 
> exception if the file names *differ*, not when they're equal.
> Consider adding a unit test for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10593) Fix incorrect string comparison in GpuDiscoverer

2021-02-10 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10593:
--
Fix Version/s: 3.4.0

> Fix incorrect string comparison in GpuDiscoverer
> 
>
> Key: YARN-10593
> URL: https://issues.apache.org/jira/browse/YARN-10593
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10593-001.patch
>
>
> The following comparison in {{GpuDiscoverer}} is invalid:
> {noformat}
>    binaryPath = configuredBinaryFile;
>   // If path exists but file name is incorrect don't execute the file
>   String fileName = binaryPath.getName();
>   if (DEFAULT_BINARY_NAME.equals(fileName)) {  <--- inverse condition 
> needed
> String msg = String.format("Please check the configuration value of"
>  +" %s. It should point to an %s binary.",
>  YarnConfiguration.NM_GPU_PATH_TO_EXEC,
>  DEFAULT_BINARY_NAME);
> throwIfNecessary(new YarnException(msg), config);
> LOG.warn(msg);
>   }{noformat}
> Obviously it should be other way around - we should log a warning or throw an 
> exception if the file names *differ*, not when they're equal.
> Consider adding a unit test for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion

2021-02-10 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282357#comment-17282357
 ] 

Szilard Nemeth commented on YARN-10620:
---

Hi [~pbacsko],
Thanks for working on this. 
Latest patch LGTM, just committed to trunk.
Thanks [~gandras] for the review.

> fs2cs: parentQueue for certain placement rules are not set during conversion
> 
>
> Key: YARN-10620
> URL: https://issues.apache.org/jira/browse/YARN-10620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
> Fix For: 3.4.0
>
> Attachments: YARN-10620-001.patch, YARN-10620-002.patch
>
>
> There are some placement rules in FS which are currently not handled properly 
> by fs2cs:
> {noformat}
> 
> 
> 
> 
> {noformat}
> The first rule means that if the user queue doesn't exist, it should be 
> created as {{root.}}.
> The second means the same thing, except refers to the primary group instead 
> of the submitting user: {{root.}}.
> The problem is that in order for the create="true" setting to take effect, we 
> must set the parent queue in the generated JSON:
> Current:
> {noformat}
> {
>   "rules" : [ {
> "type" : "user",
> "matches" : "*",
> "policy" : "user",
> "fallbackResult" : "skip",
> "create" : true
>   }, {
> "type" : "user",
> "matches" : "*",
> "policy" : "primaryGroup",
> "fallbackResult" : "skip",
> "create" : true
>   } ]
> }
> {noformat}
> Expected:
> {noformat}
> {
>   "rules" : [ {
> "type" : "user",
> "matches" : "*",
> "policy" : "user",
> "fallbackResult" : "skip",
> "parentQueue": "root",
> "create" : true
>   }, {
> "type" : "user",
> "matches" : "*",
> "policy" : "primaryGroup",
> "fallbackResult" : "skip",
> "parentQueue": "root",
> "create" : true
>   } ]
> {noformat}
> This is missing right now and it need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion

2021-02-10 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10620:
--
Fix Version/s: 3.4.0

> fs2cs: parentQueue for certain placement rules are not set during conversion
> 
>
> Key: YARN-10620
> URL: https://issues.apache.org/jira/browse/YARN-10620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
> Fix For: 3.4.0
>
> Attachments: YARN-10620-001.patch, YARN-10620-002.patch
>
>
> There are some placement rules in FS which are currently not handled properly 
> by fs2cs:
> {noformat}
> 
> 
> 
> 
> {noformat}
> The first rule means that if the user queue doesn't exist, it should be 
> created as {{root.}}.
> The second means the same thing, except refers to the primary group instead 
> of the submitting user: {{root.}}.
> The problem is that in order for the create="true" setting to take effect, we 
> must set the parent queue in the generated JSON:
> Current:
> {noformat}
> {
>   "rules" : [ {
> "type" : "user",
> "matches" : "*",
> "policy" : "user",
> "fallbackResult" : "skip",
> "create" : true
>   }, {
> "type" : "user",
> "matches" : "*",
> "policy" : "primaryGroup",
> "fallbackResult" : "skip",
> "create" : true
>   } ]
> }
> {noformat}
> Expected:
> {noformat}
> {
>   "rules" : [ {
> "type" : "user",
> "matches" : "*",
> "policy" : "user",
> "fallbackResult" : "skip",
> "parentQueue": "root",
> "create" : true
>   }, {
> "type" : "user",
> "matches" : "*",
> "policy" : "primaryGroup",
> "fallbackResult" : "skip",
> "parentQueue": "root",
> "create" : true
>   } ]
> {noformat}
> This is missing right now and it need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion

2021-02-10 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10620:

Description: 
There are some placement rules in FS which are currently not handled properly 
by fs2cs:

{noformat}




{noformat}

The first rule means that if the user queue doesn't exist, it should be created 
as {{root.}}.
The second means the same thing, except refers to the primary group instead of 
the submitting user: {{root.}}.

The problem is that in order for the create="true" setting to take effect, we 
must set the parent queue in the generated JSON:

Current:
{noformat}
{
  "rules" : [ {
"type" : "user",
"matches" : "*",
"policy" : "user",
"fallbackResult" : "skip",
"create" : true
  }, {
"type" : "user",
"matches" : "*",
"policy" : "primaryGroup",
"fallbackResult" : "skip",
"create" : true
  } ]
}
{noformat}

Expected:
{noformat}
{
  "rules" : [ {
"type" : "user",
"matches" : "*",
"policy" : "user",
"fallbackResult" : "skip",
"parentQueue": "root",
"create" : true
  }, {
"type" : "user",
"matches" : "*",
"policy" : "primaryGroup",
"fallbackResult" : "skip",
"parentQueue": "root",
"create" : true
  } ]
{noformat}

This is missing right now and it need to be fixed.

  was:
There are some placement rules in FS which are currently not handled properly 
by fs2cs:

{noformat}




{noformat}

The first rule means that if the user queue doesn't exist, it should be created 
as {{root.}}.
The second means the same thing, except refers to the primary group instead of 
the submitting user: {{root.}}.

The problem is that in order for the create="true" setting to take effect, we 
must set the parent queue in the generated JSON:

{noformat}
{
  "rules" : [ {
"type" : "user",
"matches" : "*",
"policy" : "user",
"fallbackResult" : "skip",
"create" : true
  }, {
"type" : "user",
"matches" : "*",
"policy" : "primaryGroup",
"fallbackResult" : "skip",
"create" : true
  } ]
}
{noformat}

This is missing right now and it need to be fixed.


> fs2cs: parentQueue for certain placement rules are not set during conversion
> 
>
> Key: YARN-10620
> URL: https://issues.apache.org/jira/browse/YARN-10620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10620-001.patch, YARN-10620-002.patch
>
>
> There are some placement rules in FS which are currently not handled properly 
> by fs2cs:
> {noformat}
> 
> 
> 
> 
> {noformat}
> The first rule means that if the user queue doesn't exist, it should be 
> created as {{root.}}.
> The second means the same thing, except refers to the primary group instead 
> of the submitting user: {{root.}}.
> The problem is that in order for the create="true" setting to take effect, we 
> must set the parent queue in the generated JSON:
> Current:
> {noformat}
> {
>   "rules" : [ {
> "type" : "user",
> "matches" : "*",
> "policy" : "user",
> "fallbackResult" : "skip",
> "create" : true
>   }, {
> "type" : "user",
> "matches" : "*",
> "policy" : "primaryGroup",
> "fallbackResult" : "skip",
> "create" : true
>   } ]
> }
> {noformat}
> Expected:
> {noformat}
> {
>   "rules" : [ {
> "type" : "user",
> "matches" : "*",
> "policy" : "user",
> "fallbackResult" : "skip",
> "parentQueue": "root",
> "create" : true
>   }, {
> "type" : "user",
> "matches" : "*",
> "policy" : "primaryGroup",
> "fallbackResult" : "skip",
> "parentQueue": "root",
> "create" : true
>   } ]
> {noformat}
> This is missing right now and it need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10588) Percentage of queue and cluster is zero in WebUI

2021-02-10 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282328#comment-17282328
 ] 

Bilwa S T commented on YARN-10588:
--

Hi [~epayne]

I added change in FicaSchedulerApp.java as same issue can occur ie cluster and 
queue resource will not be calculated if one of the resource is zero. I added 
instanceOf check because that method is applicable only for capacityscheduler . 
Many testcases were failing once i removed 
DominantResourceCalculator.isInvalidDivisor() check as testcases had configured 
Fifoscheduler.  

> Percentage of queue and cluster is zero in WebUI 
> -
>
> Key: YARN-10588
> URL: https://issues.apache.org/jira/browse/YARN-10588
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10588.001.patch, YARN-10588.002.patch, 
> YARN-10588.003.patch
>
>
> Steps to reproduce:
> Configure below property in resource-types.xml
> {code:java}
> 
>  yarn.resource-types
>  yarn.io/gpu
>  {code}
> Submit a job
> In UI you can see % Of Queue and % Of Cluster is zero for the submitted 
> application
>  
> This is because in SchedulerApplicationAttempt has below check for 
> calculating queueUsagePerc and clusterUsagePerc
> {code:java}
> if (!calc.isInvalidDivisor(cluster)) {
> float queueCapacityPerc = queue.getQueueInfo(false, false)
> .getCapacity();
> queueUsagePerc = calc.divide(cluster, usedResourceClone,
> Resources.multiply(cluster, queueCapacityPerc)) * 100;
> if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) {
>   queueUsagePerc = 0.0f;
> }
> clusterUsagePerc =
> calc.divide(cluster, usedResourceClone, cluster) * 100;
>   }
> {code}
> calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10588) Percentage of queue and cluster is zero in WebUI

2021-02-10 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282328#comment-17282328
 ] 

Bilwa S T edited comment on YARN-10588 at 2/10/21, 9:19 AM:


Thanks  [~epayne] [~Jim_Brennan] for taking a look at this issue.
 
I added change in FicaSchedulerApp.java as same issue can occur ie cluster and 
queue resource will not be calculated if one of the resource is zero. I added 
instanceOf check because that method is applicable only for capacityscheduler . 
Many testcases were failing once i removed 
DominantResourceCalculator.isInvalidDivisor() check as testcases had configured 
Fifoscheduler.  


was (Author: bilwast):
Hi [~epayne]

I added change in FicaSchedulerApp.java as same issue can occur ie cluster and 
queue resource will not be calculated if one of the resource is zero. I added 
instanceOf check because that method is applicable only for capacityscheduler . 
Many testcases were failing once i removed 
DominantResourceCalculator.isInvalidDivisor() check as testcases had configured 
Fifoscheduler.  

> Percentage of queue and cluster is zero in WebUI 
> -
>
> Key: YARN-10588
> URL: https://issues.apache.org/jira/browse/YARN-10588
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10588.001.patch, YARN-10588.002.patch, 
> YARN-10588.003.patch
>
>
> Steps to reproduce:
> Configure below property in resource-types.xml
> {code:java}
> 
>  yarn.resource-types
>  yarn.io/gpu
>  {code}
> Submit a job
> In UI you can see % Of Queue and % Of Cluster is zero for the submitted 
> application
>  
> This is because in SchedulerApplicationAttempt has below check for 
> calculating queueUsagePerc and clusterUsagePerc
> {code:java}
> if (!calc.isInvalidDivisor(cluster)) {
> float queueCapacityPerc = queue.getQueueInfo(false, false)
> .getCapacity();
> queueUsagePerc = calc.divide(cluster, usedResourceClone,
> Resources.multiply(cluster, queueCapacityPerc)) * 100;
> if (Float.isNaN(queueUsagePerc) || Float.isInfinite(queueUsagePerc)) {
>   queueUsagePerc = 0.0f;
> }
> clusterUsagePerc =
> calc.divide(cluster, usedResourceClone, cluster) * 100;
>   }
> {code}
> calc.isInvalidDivisor(cluster) always returns true as gpu resource is 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion

2021-02-10 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282323#comment-17282323
 ] 

Andras Gyori commented on YARN-10620:
-

[~pbacsko] I agree with your solution, the set is more appropriate and 
efficient here +1.

> fs2cs: parentQueue for certain placement rules are not set during conversion
> 
>
> Key: YARN-10620
> URL: https://issues.apache.org/jira/browse/YARN-10620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10620-001.patch, YARN-10620-002.patch
>
>
> There are some placement rules in FS which are currently not handled properly 
> by fs2cs:
> {noformat}
> 
> 
> 
> 
> {noformat}
> The first rule means that if the user queue doesn't exist, it should be 
> created as {{root.}}.
> The second means the same thing, except refers to the primary group instead 
> of the submitting user: {{root.}}.
> The problem is that in order for the create="true" setting to take effect, we 
> must set the parent queue in the generated JSON:
> {noformat}
> {
>   "rules" : [ {
> "type" : "user",
> "matches" : "*",
> "policy" : "user",
> "fallbackResult" : "skip",
> "create" : true
>   }, {
> "type" : "user",
> "matches" : "*",
> "policy" : "primaryGroup",
> "fallbackResult" : "skip",
> "create" : true
>   } ]
> }
> {noformat}
> This is missing right now and it need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion

2021-02-10 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282322#comment-17282322
 ] 

Peter Bacsko commented on YARN-10620:
-

[~gandras] thanks, I modified the code a little bit, used a set instead of an 
array.

> fs2cs: parentQueue for certain placement rules are not set during conversion
> 
>
> Key: YARN-10620
> URL: https://issues.apache.org/jira/browse/YARN-10620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10620-001.patch, YARN-10620-002.patch
>
>
> There are some placement rules in FS which are currently not handled properly 
> by fs2cs:
> {noformat}
> 
> 
> 
> 
> {noformat}
> The first rule means that if the user queue doesn't exist, it should be 
> created as {{root.}}.
> The second means the same thing, except refers to the primary group instead 
> of the submitting user: {{root.}}.
> The problem is that in order for the create="true" setting to take effect, we 
> must set the parent queue in the generated JSON:
> {noformat}
> {
>   "rules" : [ {
> "type" : "user",
> "matches" : "*",
> "policy" : "user",
> "fallbackResult" : "skip",
> "create" : true
>   }, {
> "type" : "user",
> "matches" : "*",
> "policy" : "primaryGroup",
> "fallbackResult" : "skip",
> "create" : true
>   } ]
> }
> {noformat}
> This is missing right now and it need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion

2021-02-10 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10620:

Attachment: YARN-10620-002.patch

> fs2cs: parentQueue for certain placement rules are not set during conversion
> 
>
> Key: YARN-10620
> URL: https://issues.apache.org/jira/browse/YARN-10620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10620-001.patch, YARN-10620-002.patch
>
>
> There are some placement rules in FS which are currently not handled properly 
> by fs2cs:
> {noformat}
> 
> 
> 
> 
> {noformat}
> The first rule means that if the user queue doesn't exist, it should be 
> created as {{root.}}.
> The second means the same thing, except refers to the primary group instead 
> of the submitting user: {{root.}}.
> The problem is that in order for the create="true" setting to take effect, we 
> must set the parent queue in the generated JSON:
> {noformat}
> {
>   "rules" : [ {
> "type" : "user",
> "matches" : "*",
> "policy" : "user",
> "fallbackResult" : "skip",
> "create" : true
>   }, {
> "type" : "user",
> "matches" : "*",
> "policy" : "primaryGroup",
> "fallbackResult" : "skip",
> "create" : true
>   } ]
> }
> {noformat}
> This is missing right now and it need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10620) fs2cs: parentQueue for certain placement rules are not set during conversion

2021-02-10 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282314#comment-17282314
 ] 

Andras Gyori commented on YARN-10620:
-

Thank you [~pbacsko] for the patch. It looks good to me, I have one minor 
addition to this:
 * Checking the policy would be more readable, if the policies were stored in a 
constant array, and checked, if the policy is contained in this array. This 
would reduce the expression to !usePercentages && ArrayUtils.contains(policies, 
policy).

> fs2cs: parentQueue for certain placement rules are not set during conversion
> 
>
> Key: YARN-10620
> URL: https://issues.apache.org/jira/browse/YARN-10620
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10620-001.patch
>
>
> There are some placement rules in FS which are currently not handled properly 
> by fs2cs:
> {noformat}
> 
> 
> 
> 
> {noformat}
> The first rule means that if the user queue doesn't exist, it should be 
> created as {{root.}}.
> The second means the same thing, except refers to the primary group instead 
> of the submitting user: {{root.}}.
> The problem is that in order for the create="true" setting to take effect, we 
> must set the parent queue in the generated JSON:
> {noformat}
> {
>   "rules" : [ {
> "type" : "user",
> "matches" : "*",
> "policy" : "user",
> "fallbackResult" : "skip",
> "create" : true
>   }, {
> "type" : "user",
> "matches" : "*",
> "policy" : "primaryGroup",
> "fallbackResult" : "skip",
> "create" : true
>   } ]
> }
> {noformat}
> This is missing right now and it need to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org