[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

2021-03-18 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304629#comment-17304629
 ] 

Bilwa S T commented on YARN-10697:
--

Thanks [~Jim_Brennan] [~jhung] for your comments.

I basically added changes in Resource#toString so that its easier for user to 
read. I agree its not correct to add it there as its called from many other 
places. So can we introduce a new method in Resource.java which can print it in 
MB|GB|TB?

> Resources are displayed in bytes in UI for schedulers other than capacity
> -
>
> Key: YARN-10697
> URL: https://issues.apache.org/jira/browse/YARN-10697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10697.001.patch, image-2021-03-17-11-30-57-216.png
>
>
> Resources.newInstance expects MB as memory whereas in MetricsOverviewTable 
> passes resources in bytes . Also we should display memory in GB for better 
> readability for user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU.

2021-03-18 Thread Qi Zhu (Jira)
Qi Zhu created YARN-10704:
-

 Summary: The CS effective capacity for absolute mode in UI should 
support GPU.
 Key: YARN-10704
 URL: https://issues.apache.org/jira/browse/YARN-10704
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacity scheduler
Reporter: Qi Zhu
Assignee: Qi Zhu
 Attachments: image-2021-03-19-12-05-28-412.png, 
image-2021-03-19-12-08-35-273.png

Actually there are no information about the effective capacity about GPU in UI 
for absolute resource mode.

!image-2021-03-19-12-05-28-412.png|width=873,height=136!

But we have this information in QueueMetrics:

!image-2021-03-19-12-08-35-273.png|width=613,height=268!

 

It's very important for our GPU users to use in absolute mode, there still have 
nothing to know GPU absolute information in CS Queue UI. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10616) Nodemanagers cannot detect GPU failures

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304596#comment-17304596
 ] 

Qi Zhu commented on YARN-10616:
---

Thanks [~ebadger] for clarify.

It make sense to me now. If we can realize that, when we use 
-updateNodeResource, we can check whether some nodes' original resource is 
changed by NM-RM heartbeat check, just by cached or a flag, if changed we 
should response those node key information to client.

And the unhealthy node which reduce GPU resource, we can also add to the UI and 
Metrics, to let me known, but not affect the scheduling.

Thanks.

> Nodemanagers cannot detect GPU failures
> ---
>
> Key: YARN-10616
> URL: https://issues.apache.org/jira/browse/YARN-10616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>
> As stated above, the bug is that GPUs can fail, but the NM doesn't notice the 
> failure. The NM will continue to schedule tasks onto the failed GPU, but the 
> GPU won't actually work and so the container will likely fail or run very 
> slowly on the CPU. 
> My initial thought on solving this is to add NM resource capabilities to the 
> NM-RM heartbeat and have the RM update its view of the NM's resource 
> capabilities on each heartbeat. This would be a fairly trivial change, but 
> comes with the unfortunate side effect that it completely undermindes {{yarn 
> rmadmin -updateNodeResource}}. When you run {{-updateNodeResource}} the 
> assumption is that the node will retain these new resource capabilities until 
> either the NM or RM is restarted. But with a heartbeat interaction constantly 
> updating those resource capabilities from the NM perspective, the explicit 
> changes via {{-updateNodeResource}} would be lost on the next heartbeat. We 
> could potentially add a flag to ignore the heartbeat updates for any node who 
> has had {{-updateNodeResource}} called on it (until a re-registration). But 
> in this case, the node would no longer get resource capability updates until 
> the NM or RM restarted. If {{-updateNodeResource}} is used a decent amount, 
> then that would give potentially unexpected behavior in relation to nodes 
> properly auto-detecting failures.
> Another idea is to add a GPU monitor thread on the NM to periodically run 
> {{nvidia-smi}} and detect changes in the number of healthy GPUs. If that 
> number decreased, the node would hook into the health check status and mark 
> itself as unhealthy. The downside of this approach is that a single failed 
> GPU would mean taking out an entire node (e.g. 8 GPUs).
> I would really like to go with the NM-RM heartbeat approach, but the 
> {{-updateNodeResource}} issue bothers me. The second approach is ok I guess, 
> but I also don't like taking down whole GPU nodes when only a single GPU is 
> bad. Would like to hear thoughts of others on how best to approach this
> [~jhung], [~leftnoteasy], [~sunilg], [~epayne], [~Jim_Brennan]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10701) The yarn.resource-types should support multi types without trimmed.

2021-03-18 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-10701:
---
Fix Version/s: 3.3.1
   3.4.0

+1. Thanks for the patch, [~zhuqi]. I've committed this to trunk (3.4) and 
branch-3.3

> The yarn.resource-types should support multi types without trimmed.
> ---
>
> Key: YARN-10701
> URL: https://issues.apache.org/jira/browse/YARN-10701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10701.001.patch, YARN-10701.002.patch
>
>
> {code:java}
> 
>  
>  yarn.resource-types
>  yarn.io/gpu, yarn.io/fpga
>  
>  {code}
>  When i configured the resource type above with gpu and fpga, the error 
> happend:
>  
> {code:java}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is 
> not a valid resource name. A valid resource name must begin with a letter and 
> contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
> name may also be optionally preceded by a name space followed by a slash. A 
> valid name space consists of period-separated groups of letters, numbers, and 
> dashes.{code}
>   
>  The resource types should support trim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity

2021-03-18 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304467#comment-17304467
 ] 

Jonathan Hung commented on YARN-10697:
--

[~Jim_Brennan] [~BilwaST] I agree, I don't think we should make the 
Resource#toString change. IMO users expect this to be bytes and making this 
change could have some unintended consequences e.g. breaking log parsing 
tooling.

> Resources are displayed in bytes in UI for schedulers other than capacity
> -
>
> Key: YARN-10697
> URL: https://issues.apache.org/jira/browse/YARN-10697
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10697.001.patch, image-2021-03-17-11-30-57-216.png
>
>
> Resources.newInstance expects MB as memory whereas in MetricsOverviewTable 
> passes resources in bytes . Also we should display memory in GB for better 
> readability for user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10616) Nodemanagers cannot detect GPU failures

2021-03-18 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304456#comment-17304456
 ] 

Eric Badger edited comment on YARN-10616 at 3/18/21, 9:22 PM:
--

The issue with graceful decommissioning is that you have to edit a file on the 
RM. It would be nice to be able to run a {{yarn rmadmin}} command from a remote 
host to tell the RM to graceful decom a node. AFAIK that functionality doesn't 
exist. 

I still don't like the idea of completely undermining {{-updateNodeResource}}. 
I think I would be more on board with a feature that is disabled by default, 
but can be enabled. That way we won't break any existing ways of doing things, 
but will give more flexibility to those who want to detect these types of 
failures. They will just have to understand that it isn't compatible with 
{{-updateNodeResource}}


was (Author: ebadger):
The issue with graceful decommissioning is that you have to edit a file on the 
RM. It would be nice to be able to run a `yarn rmadmin` command from a remote 
host to tell the RM to graceful decom a node. AFAIK that functionality doesn't 
exist. 

I still don't like the idea of completely undermining {{-updateNodeResource}}. 
I think I would be more on board with a feature that is disabled by default, 
but can be enabled. That way we won't break any existing ways of doing things, 
but will give more flexibility to those who want to detect these types of 
failures. They will just have to understand that it isn't compatible with 
{{-updateNodeResource}}

> Nodemanagers cannot detect GPU failures
> ---
>
> Key: YARN-10616
> URL: https://issues.apache.org/jira/browse/YARN-10616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>
> As stated above, the bug is that GPUs can fail, but the NM doesn't notice the 
> failure. The NM will continue to schedule tasks onto the failed GPU, but the 
> GPU won't actually work and so the container will likely fail or run very 
> slowly on the CPU. 
> My initial thought on solving this is to add NM resource capabilities to the 
> NM-RM heartbeat and have the RM update its view of the NM's resource 
> capabilities on each heartbeat. This would be a fairly trivial change, but 
> comes with the unfortunate side effect that it completely undermindes {{yarn 
> rmadmin -updateNodeResource}}. When you run {{-updateNodeResource}} the 
> assumption is that the node will retain these new resource capabilities until 
> either the NM or RM is restarted. But with a heartbeat interaction constantly 
> updating those resource capabilities from the NM perspective, the explicit 
> changes via {{-updateNodeResource}} would be lost on the next heartbeat. We 
> could potentially add a flag to ignore the heartbeat updates for any node who 
> has had {{-updateNodeResource}} called on it (until a re-registration). But 
> in this case, the node would no longer get resource capability updates until 
> the NM or RM restarted. If {{-updateNodeResource}} is used a decent amount, 
> then that would give potentially unexpected behavior in relation to nodes 
> properly auto-detecting failures.
> Another idea is to add a GPU monitor thread on the NM to periodically run 
> {{nvidia-smi}} and detect changes in the number of healthy GPUs. If that 
> number decreased, the node would hook into the health check status and mark 
> itself as unhealthy. The downside of this approach is that a single failed 
> GPU would mean taking out an entire node (e.g. 8 GPUs).
> I would really like to go with the NM-RM heartbeat approach, but the 
> {{-updateNodeResource}} issue bothers me. The second approach is ok I guess, 
> but I also don't like taking down whole GPU nodes when only a single GPU is 
> bad. Would like to hear thoughts of others on how best to approach this
> [~jhung], [~leftnoteasy], [~sunilg], [~epayne], [~Jim_Brennan]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10616) Nodemanagers cannot detect GPU failures

2021-03-18 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304456#comment-17304456
 ] 

Eric Badger commented on YARN-10616:


The issue with graceful decommissioning is that you have to edit a file on the 
RM. It would be nice to be able to run a `yarn rmadmin` command from a remote 
host to tell the RM to graceful decom a node. AFAIK that functionality doesn't 
exist. 

I still don't like the idea of completely undermining {{-updateNodeResource}}. 
I think I would be more on board with a feature that is disabled by default, 
but can be enabled. That way we won't break any existing ways of doing things, 
but will give more flexibility to those who want to detect these types of 
failures. They will just have to understand that it isn't compatible with 
{{-updateNodeResource}}

> Nodemanagers cannot detect GPU failures
> ---
>
> Key: YARN-10616
> URL: https://issues.apache.org/jira/browse/YARN-10616
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>
> As stated above, the bug is that GPUs can fail, but the NM doesn't notice the 
> failure. The NM will continue to schedule tasks onto the failed GPU, but the 
> GPU won't actually work and so the container will likely fail or run very 
> slowly on the CPU. 
> My initial thought on solving this is to add NM resource capabilities to the 
> NM-RM heartbeat and have the RM update its view of the NM's resource 
> capabilities on each heartbeat. This would be a fairly trivial change, but 
> comes with the unfortunate side effect that it completely undermindes {{yarn 
> rmadmin -updateNodeResource}}. When you run {{-updateNodeResource}} the 
> assumption is that the node will retain these new resource capabilities until 
> either the NM or RM is restarted. But with a heartbeat interaction constantly 
> updating those resource capabilities from the NM perspective, the explicit 
> changes via {{-updateNodeResource}} would be lost on the next heartbeat. We 
> could potentially add a flag to ignore the heartbeat updates for any node who 
> has had {{-updateNodeResource}} called on it (until a re-registration). But 
> in this case, the node would no longer get resource capability updates until 
> the NM or RM restarted. If {{-updateNodeResource}} is used a decent amount, 
> then that would give potentially unexpected behavior in relation to nodes 
> properly auto-detecting failures.
> Another idea is to add a GPU monitor thread on the NM to periodically run 
> {{nvidia-smi}} and detect changes in the number of healthy GPUs. If that 
> number decreased, the node would hook into the health check status and mark 
> itself as unhealthy. The downside of this approach is that a single failed 
> GPU would mean taking out an entire node (e.g. 8 GPUs).
> I would really like to go with the NM-RM heartbeat approach, but the 
> {{-updateNodeResource}} issue bothers me. The second approach is ok I guess, 
> but I also don't like taking down whole GPU nodes when only a single GPU is 
> bad. Would like to hear thoughts of others on how best to approach this
> [~jhung], [~leftnoteasy], [~sunilg], [~epayne], [~Jim_Brennan]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups

2021-03-18 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304445#comment-17304445
 ] 

Ahmed Hussein commented on YARN-10597:
--

Thanks [~shuzirra] for the patch.
It is fine to ignore the error of the init tests. It should be fine to enough 
to verify against the tests affected by YARN-10425.
I am (+1 non-binding)

> CSMappingPlacementRule should not create new instance of Groups
> ---
>
> Key: YARN-10597
> URL: https://issues.apache.org/jira/browse/YARN-10597
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10597.001.patch
>
>
> As [~ahussein] pointed out in YARN-10425, no new Groups instance should be 
> created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-03-18 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304408#comment-17304408
 ] 

Hadoop QA commented on YARN-10702:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
25s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
16s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
21s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
39s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  2s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 26m 
15s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  3m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
28s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
9s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m  
9s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
9s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
9s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 36s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/826/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color}
 | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 
84 unchanged - 0 fixed = 87 total (was 84) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| 

[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-03-18 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304387#comment-17304387
 ] 

Hadoop QA commented on YARN-10702:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
54s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
23s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
15s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
32s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
2s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 10s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
56s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 24m 
51s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  4m  
6s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
58s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
58s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
4s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
4s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 32s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/825/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color}
 | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 6 new + 
84 unchanged - 0 fixed = 90 total (was 84) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| 

[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304337#comment-17304337
 ] 

Hadoop QA commented on YARN-10674:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
34s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  0s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 18m 
21s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 13 unchanged - 7 fixed = 13 total (was 20) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 53s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| 

[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable

2021-03-18 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304333#comment-17304333
 ] 

Eric Badger commented on YARN-10495:


I would suggest using a dockerfile with the same OS version as what you plan to 
run on

> make the rpath of container-executor configurable
> -
>
> Key: YARN-10495
> URL: https://issues.apache.org/jira/browse/YARN-10495
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10495.001.patch, YARN-10495.002.patch
>
>
> In  https://issues.apache.org/jira/browse/YARN-9561 we add dependency on 
> crypto to container-executor, we meet a case that in our jenkins machine, we 
> have libcrypto.so.1.0.0  in shared lib env. but in our nodemanager machine we 
> don't have  libcrypto.so.1.0.0  but *libcrypto.so.1.1.*
> We use a  internal custom dynamic link library environment 
> /usr/lib/x86_64-linux-gnu
> and we build hadoop with parameter as blow
> {code:java}
>  -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu
> {code}
>  
> Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where 
> is libcrypto)
> {code:java}
> -rw-r--r-- 1 root root   240136 Nov 28  2014 libcroco-0.6.so.3.0.1
> -rw-r--r-- 1 root root54550 Jun 18  2017 libcrypt.a
> -rw-r--r-- 1 root root  4306444 Sep 26  2019 libcrypto.a
> lrwxrwxrwx 1 root root   18 Sep 26  2019 libcrypto.so -> 
> libcrypto.so.1.0.0
> -rw-r--r-- 1 root root  2070976 Sep 26  2019 libcrypto.so.1.0.0
> lrwxrwxrwx 1 root root   35 Jun 18  2017 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r-- 1 root root  298 Jun 18  2017 libc.so
> {code}
>  
> Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is 
> libcrypto)
> {code:java}
> -rw-r--r--  1 root root55852 2��   7  2019 libcrypt.a
> -rw-r--r--  1 root root  4864244 9��  28  2019 libcrypto.a
> lrwxrwxrwx  1 root root   16 9��  28  2019 libcrypto.so -> 
> libcrypto.so.1.1
> -rw-r--r--  1 root root  2504576 12�� 24  2019 libcrypto.so.1.0.2
> -rw-r--r--  1 root root  2715840 9��  28  2019 libcrypto.so.1.1
> lrwxrwxrwx  1 root root   35 2��   7  2019 libcrypt.so -> 
> /lib/x86_64-linux-gnu/libcrypt.so.1
> -rw-r--r--  1 root root  298 2��   7  2019 libc.so
> {code}
>  We build container-executor with 
> The  libcrypto.so 's version is not same case error when we start nodemanager
>  
> {code:java}
> .. 3 more Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: 
> error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared 
> object file: No such file or directory at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306)
>  ... 4 more Caused by: ExitCodeException exitCode=127: 
> /home/hadoop/hadoop/bin/container-executor: error while loading shared 
> libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file 
> or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at 
> org.apache.hadoop.util.Shell.run(Shell.java:901) at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154)
>  ... 6 more 
> {code}
>  
> We should make RPATH of container-executor configurable to solve this problem 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.

2021-03-18 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-10703:
---
Fix Version/s: 3.3.1

I've also committed this to branch-3.3. This has now been committed to trunk 
(3.4) and branch-3.3

> Fix potential null pointer error of gpuNodeResourceUpdateHandler in 
> NodeResourceMonitorImpl.
> 
>
> Key: YARN-10703
> URL: https://issues.apache.org/jira/browse/YARN-10703
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10703.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10692) Add Node GPU Utilization and apply to NodeMetrics.

2021-03-18 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-10692:
---
Fix Version/s: 3.3.1

I cherry-picked this to branch-3.3 I would like all of the GPU stuff to go back 
to 3.3 if the cherry-picks are clean. 

This has now been committed to trunk (3.4) and branch-3.3

> Add Node GPU Utilization and apply to NodeMetrics.
> --
>
> Key: YARN-10692
> URL: https://issues.apache.org/jira/browse/YARN-10692
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10692.001.patch, YARN-10692.002.patch, 
> YARN-10692.003.patch
>
>
> Now there are no node level GPU Utilization, this issue will add it, and add 
> it to NodeMetrics first.
> cc [~pbacsko]  [~Jim_Brennan]  [~ebadger]  [~gandras]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10641) Refactor the max app related update, and fix maxApplications update error when add new queues.

2021-03-18 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10641:
--
Summary: Refactor the max app related update, and fix maxApplications 
update error when add new queues.  (was: Refactor the max app related update, 
and fix maxApllications update error when add new queues.)

> Refactor the max app related update, and fix maxApplications update error 
> when add new queues.
> --
>
> Key: YARN-10641
> URL: https://issues.apache.org/jira/browse/YARN-10641
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Fix For: 3.4.0
>
> Attachments: YARN-10641.001.patch, YARN-10641.002.patch, 
> YARN-10641.003.patch, YARN-10641.004.patch, YARN-10641.005.patch, 
> YARN-10641.006.patch, image-2021-02-20-15-49-58-677.png, 
> image-2021-02-20-15-53-51-099.png, image-2021-02-20-15-55-44-780.png, 
> image-2021-02-20-16-29-18-519.png, image-2021-02-20-16-31-13-714.png
>
>
> When refactor the update logic in YARN-10504 .
> The update max applications based abs/cap is wrong, this should be fixed, 
> because the max applications is key part to limit applications in CS.
> For example: 
> When adding a dynamic queue, the other children's max app of parent queue are 
> not updated correctly:
> !image-2021-02-20-15-53-51-099.png|width=639,height=509!  
> The new added queue's max app will updated correctly:
> !image-2021-02-20-15-55-44-780.png|width=542,height=426!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.

2021-03-18 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304313#comment-17304313
 ] 

Eric Badger commented on YARN-10703:


+1 I've committed this to trunk (3.4)

> Fix potential null pointer error of gpuNodeResourceUpdateHandler in 
> NodeResourceMonitorImpl.
> 
>
> Key: YARN-10703
> URL: https://issues.apache.org/jira/browse/YARN-10703
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10703.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.

2021-03-18 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-10703:
---
Fix Version/s: 3.4.0

> Fix potential null pointer error of gpuNodeResourceUpdateHandler in 
> NodeResourceMonitorImpl.
> 
>
> Key: YARN-10703
> URL: https://issues.apache.org/jira/browse/YARN-10703
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10703.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.

2021-03-18 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304282#comment-17304282
 ] 

Hadoop QA commented on YARN-10703:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
21s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
40s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
31s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
25s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 39s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m  
3s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
29s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
29s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
23s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 37s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| 

[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304253#comment-17304253
 ] 

Hadoop QA commented on YARN-10674:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 24m 
50s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
36s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 42s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 
49s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 13 unchanged - 7 fixed = 13 total (was 20) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  9s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| 

[jira] [Updated] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-03-18 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-10702:
---
Attachment: YARN-10702.004.patch

> Add cluster metric for amount of CPU used by RM Event Processor
> ---
>
> Key: YARN-10702
> URL: https://issues.apache.org/jira/browse/YARN-10702
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: Scheduler-Busy.png, YARN-10702.001.patch, 
> YARN-10702.002.patch, YARN-10702.003.patch, YARN-10702.004.patch, 
> simon-scheduler-busy.png
>
>
> Add a cluster metric to track the cpu usage of the ResourceManager Event 
> Processing thread.   This lets us know when the critical path of the RM is 
> running out of headroom.
> This feature was originally added for us internally by [~nroberts] and we've 
> been running with it on production clusters for nearly four years.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-03-18 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304250#comment-17304250
 ] 

Jim Brennan commented on YARN-10702:


Jumped the gun.  Patch 004 has fixes for the other checkstyle issues.

> Add cluster metric for amount of CPU used by RM Event Processor
> ---
>
> Key: YARN-10702
> URL: https://issues.apache.org/jira/browse/YARN-10702
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: Scheduler-Busy.png, YARN-10702.001.patch, 
> YARN-10702.002.patch, YARN-10702.003.patch, YARN-10702.004.patch, 
> simon-scheduler-busy.png
>
>
> Add a cluster metric to track the cpu usage of the ResourceManager Event 
> Processing thread.   This lets us know when the critical path of the RM is 
> running out of headroom.
> This feature was originally added for us internally by [~nroberts] and we've 
> been running with it on production clusters for nearly four years.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-03-18 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304240#comment-17304240
 ] 

Jim Brennan commented on YARN-10702:


Thanks for the review [~zhuqi]!  patch 003 fixes the method names as suggested.


> Add cluster metric for amount of CPU used by RM Event Processor
> ---
>
> Key: YARN-10702
> URL: https://issues.apache.org/jira/browse/YARN-10702
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: Scheduler-Busy.png, YARN-10702.001.patch, 
> YARN-10702.002.patch, YARN-10702.003.patch, simon-scheduler-busy.png
>
>
> Add a cluster metric to track the cpu usage of the ResourceManager Event 
> Processing thread.   This lets us know when the critical path of the RM is 
> running out of headroom.
> This feature was originally added for us internally by [~nroberts] and we've 
> been running with it on production clusters for nearly four years.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor

2021-03-18 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-10702:
---
Attachment: YARN-10702.003.patch

> Add cluster metric for amount of CPU used by RM Event Processor
> ---
>
> Key: YARN-10702
> URL: https://issues.apache.org/jira/browse/YARN-10702
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.10.1, 3.4.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: Scheduler-Busy.png, YARN-10702.001.patch, 
> YARN-10702.002.patch, YARN-10702.003.patch, simon-scheduler-busy.png
>
>
> Add a cluster metric to track the cpu usage of the ResourceManager Event 
> Processing thread.   This lets us know when the critical path of the RM is 
> running out of headroom.
> This feature was originally added for us internally by [~nroberts] and we've 
> been running with it on production clusters for nearly four years.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304228#comment-17304228
 ] 

Qi Zhu commented on YARN-10674:
---

[~gandras]

Now i understand you, we can just use the code:
{code:java}
checkDisablePreemption(preemptionMode, !cliParser.hasOption(CliOption.
DISABLE_PREEMPTION.shortSwitch));
{code}
 
{code:java}
private static void checkDisablePreemption(FSConfigToCSConfigConverterParams.
PreemptionMode preemptionMode, boolean enabled) {
  if (preemptionMode == null && !enabled) {
throw new PreconditionException(
"Specified disable-preemption mode is illegal, " +
" use nopolicy or observeonly.");
  }
}
{code}
PreemptionMode.ENABLED is not necessary, i updated in latest patch. 

I am glad that i make sense now.

[~pbacsko]

If you any other advice?

Thanks.

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch, YARN-10674.016.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10674:
--
Attachment: YARN-10674.016.patch

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch, YARN-10674.016.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304214#comment-17304214
 ] 

Qi Zhu commented on YARN-10703:
---

[~pbacsko] [~gandras] [~ebadger] 

Sorry for the potential null pointer introduced in YARN-10692.

I fixed it in this jira.

Could you help review this?

Thanks.

> Fix potential null pointer error of gpuNodeResourceUpdateHandler in 
> NodeResourceMonitorImpl.
> 
>
> Key: YARN-10703
> URL: https://issues.apache.org/jira/browse/YARN-10703
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10703.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.

2021-03-18 Thread Qi Zhu (Jira)
Qi Zhu created YARN-10703:
-

 Summary: Fix potential null pointer error of 
gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.
 Key: YARN-10703
 URL: https://issues.apache.org/jira/browse/YARN-10703
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Qi Zhu
Assignee: Qi Zhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304142#comment-17304142
 ] 

Qi Zhu edited comment on YARN-10674 at 3/18/21, 1:29 PM:
-

Thanks [~gandras] for reply.

If we don't have  PreemptionMode.ENABLED, we can use the has option to know if 
this is enabled and passed to PreemptionMode enabled field.
{code:java}
public static PreemptionMode fromString(String cliOption,
boolean enabled) {
  if (enabled) {
return PreemptionMode.ENABLED;
  } else {
if (StringUtils.isEmpty(cliOption)) {
  return PreemptionMode.NO_POLICY;
} else {
  if (cliOption.trim().
  equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) {
return PreemptionMode.OBSERVE_ONLY;
  } else if (cliOption.trim().
  equals(PreemptionMode.NO_POLICY.getCliOption())) {
return PreemptionMode.NO_POLICY;
  } else {
return null;
  }
}
  }
}
{code}
If return null:
{code:java}
private static void checkDisablePreemption(FSConfigToCSConfigConverterParams.
PreemptionMode preemptionMode) {
  if (preemptionMode == null) {
throw new PreconditionException(
"Specified disable-preemption mode is illegal, " +
" use nopolicy or observeonly.");
  }
}
{code}
 

But fromString should return a value to make it used later,  if it will return 
null , it will confused with the case that we disabled but print not nopolicy 
or observeonly. I think the flag will make this clear that we have four case 
return value:
 # null mean that we use illegal value 
 # PreemptionMode.ENABLED 
 # PreemptionMode.OBSERVE_ONLY
 # PreemptionMode.NO_POLICY

What's your opinion about this?

 


was (Author: zhuqi):
Thanks [~gandras] for reply.

If we don't have  PreemptionMode.ENABLED, we can use the has option to know if 
this is enabled and passed to PreemptionMode enabled field.
{code:java}
public static PreemptionMode fromString(String cliOption,
boolean enabled) {
  if (enabled) {
return PreemptionMode.ENABLED;
  } else {
if (StringUtils.isEmpty(cliOption)) {
  return PreemptionMode.NO_POLICY;
} else {
  if (cliOption.trim().
  equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) {
return PreemptionMode.OBSERVE_ONLY;
  } else if (cliOption.trim().
  equals(PreemptionMode.NO_POLICY.getCliOption())) {
return PreemptionMode.NO_POLICY;
  } else {
return null;
  }
}
  }
}
{code}
But fromString should return a value to make it used later,  if it will return 
null , it will confused with the case that we disabled but print not nopolicy 
or observeonly. I think the flag will make this clear that we have four case 
return value:
 # null mean that we use illegal value 
 # PreemptionMode.ENABLED 
 # PreemptionMode.OBSERVE_ONLY
 # PreemptionMode.NO_POLICY

What's your opinion about this?

 

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304142#comment-17304142
 ] 

Qi Zhu edited comment on YARN-10674 at 3/18/21, 1:26 PM:
-

Thanks [~gandras] for reply.

If we don't have  PreemptionMode.ENABLED, we can use the has option to know if 
this is enabled and passed to PreemptionMode enabled field.
{code:java}
public static PreemptionMode fromString(String cliOption,
boolean enabled) {
  if (enabled) {
return PreemptionMode.ENABLED;
  } else {
if (StringUtils.isEmpty(cliOption)) {
  return PreemptionMode.NO_POLICY;
} else {
  if (cliOption.trim().
  equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) {
return PreemptionMode.OBSERVE_ONLY;
  } else if (cliOption.trim().
  equals(PreemptionMode.NO_POLICY.getCliOption())) {
return PreemptionMode.NO_POLICY;
  } else {
return null;
  }
}
  }
}
{code}
But fromString should return a value to make it used later,  if it will return 
null , it will confused with the case that we disabled but print not nopolicy 
or observeonly. I think the flag will make this clear that we have four case 
return value:
 # null mean that we use illegal value 
 # PreemptionMode.ENABLED 
 # PreemptionMode.OBSERVE_ONLY
 # PreemptionMode.NO_POLICY

What's your opinion about this?

 


was (Author: zhuqi):
Thanks [~gandras] for reply.

If we don't have  PreemptionMode.ENABLED, we can use the has option to know if 
this is enabled and passed to PreemptionMode enabled field.
{code:java}
public static PreemptionMode fromString(String cliOption,
boolean enabled) {
  if (enabled) {
return PreemptionMode.ENABLED;
  } else {
if (StringUtils.isEmpty(cliOption)) {
  return PreemptionMode.NO_POLICY;
} else {
  if (cliOption.trim().
  equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) {
return PreemptionMode.OBSERVE_ONLY;
  } else if (cliOption.trim().
  equals(PreemptionMode.NO_POLICY.getCliOption())) {
return PreemptionMode.NO_POLICY;
  } else {
return null;
  }
}
  }
}
{code}
But fromString should return a value to make it used later,  if it will return 
null , it will confused with the case that we disabled but print not nopolicy 
or observeonly. I think the flag will make this clear.

What's your opinion about this?

 

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304142#comment-17304142
 ] 

Qi Zhu edited comment on YARN-10674 at 3/18/21, 1:24 PM:
-

Thanks [~gandras] for reply.

If we don't have  PreemptionMode.ENABLED, we can use the has option to know if 
this is enabled and passed to PreemptionMode enabled field.
{code:java}
public static PreemptionMode fromString(String cliOption,
boolean enabled) {
  if (enabled) {
return PreemptionMode.ENABLED;
  } else {
if (StringUtils.isEmpty(cliOption)) {
  return PreemptionMode.NO_POLICY;
} else {
  if (cliOption.trim().
  equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) {
return PreemptionMode.OBSERVE_ONLY;
  } else if (cliOption.trim().
  equals(PreemptionMode.NO_POLICY.getCliOption())) {
return PreemptionMode.NO_POLICY;
  } else {
return null;
  }
}
  }
}
{code}
But fromString should return a value to make it used later,  if it will return 
null , it will confused with the case that we disabled but print not nopolicy 
or observeonly. I think the flag will make this clear.

What's your opinion about this?

 


was (Author: zhuqi):
Thanks [~gandras] for reply.

If we don't have  PreemptionMode.ENABLED, we can use the has option to know if 
this is enabled and passed to PreemptionMode enabled field.

But fromString should return a value to make it used later,  if it will return 
null , it will confused with the case that we disabled but print not nopolicy 
or observeonly. I think the flag will make this clear.

What's your opinion about this?

 

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304142#comment-17304142
 ] 

Qi Zhu commented on YARN-10674:
---

Thanks [~gandras] for reply.

If we don't have  PreemptionMode.ENABLED, we can use the has option to know if 
this is enabled and passed to PreemptionMode enabled field.

But fromString should return a value to make it used later,  if it will return 
null , it will confused with the case that we disabled but print not nopolicy 
or observeonly. I think the flag will make this clear.

What's your opinion about this?

 

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304130#comment-17304130
 ] 

Andras Gyori commented on YARN-10674:
-

These are valid suggestions [~pbacsko] and my idea was this. However, I think 
the enable flag is not necessary. PreemptionMode.ENABLED is essentially equals 
to a true flag, while PreemptionMode.NO_POLICY inherently means that the 
Preemption is disabled. If you check:
{code:java}
preemptionMode == FSConfigToCSConfigConverterParams.PreemptionMode.NO_POLICY
{code}
you do not need to check if Preemption is disabled, because the enum is 
mutually exclusive (you can not have both PreemptionMode.ENABLED and 
PreemptionMode.NO_POLICY).

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10701) The yarn.resource-types should support multi types without trimmed.

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304124#comment-17304124
 ] 

Qi Zhu commented on YARN-10701:
---

Thanks [~gandras] for your confirm.

[~pbacsko] Could you help review this?

Thanks.

> The yarn.resource-types should support multi types without trimmed.
> ---
>
> Key: YARN-10701
> URL: https://issues.apache.org/jira/browse/YARN-10701
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10701.001.patch, YARN-10701.002.patch
>
>
> {code:java}
> 
>  
>  yarn.resource-types
>  yarn.io/gpu, yarn.io/fpga
>  
>  {code}
>  When i configured the resource type above with gpu and fpga, the error 
> happend:
>  
> {code:java}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is 
> not a valid resource name. A valid resource name must begin with a letter and 
> contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource 
> name may also be optionally preceded by a name space followed by a slash. A 
> valid name space consists of period-separated groups of letters, numbers, and 
> dashes.{code}
>   
>  The resource types should support trim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304122#comment-17304122
 ] 

Qi Zhu commented on YARN-10674:
---

Thanks a lot [~pbacsko] for patient review.

Very good suggestion, it make sense to me now, i have updated this in latest 
patch.

Thanks.:D

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Qi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi Zhu updated YARN-10674:
--
Attachment: YARN-10674.015.patch

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, 
> YARN-10674.015.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10641) Refactor the max app related update, and fix maxApllications update error when add new queues.

2021-03-18 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304117#comment-17304117
 ] 

Peter Bacsko commented on YARN-10641:
-

+1

Thanks for the patch [~zhuqi] and [~gandras] for the review. Committed to trunk.

> Refactor the max app related update, and fix maxApllications update error 
> when add new queues.
> --
>
> Key: YARN-10641
> URL: https://issues.apache.org/jira/browse/YARN-10641
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10641.001.patch, YARN-10641.002.patch, 
> YARN-10641.003.patch, YARN-10641.004.patch, YARN-10641.005.patch, 
> YARN-10641.006.patch, image-2021-02-20-15-49-58-677.png, 
> image-2021-02-20-15-53-51-099.png, image-2021-02-20-15-55-44-780.png, 
> image-2021-02-20-16-29-18-519.png, image-2021-02-20-16-31-13-714.png
>
>
> When refactor the update logic in YARN-10504 .
> The update max applications based abs/cap is wrong, this should be fixed, 
> because the max applications is key part to limit applications in CS.
> For example: 
> When adding a dynamic queue, the other children's max app of parent queue are 
> not updated correctly:
> !image-2021-02-20-15-53-51-099.png|width=639,height=509!  
> The new added queue's max app will updated correctly:
> !image-2021-02-20-15-55-44-780.png|width=542,height=426!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10692) Add Node GPU Utilization and apply to NodeMetrics.

2021-03-18 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304089#comment-17304089
 ] 

Peter Bacsko commented on YARN-10692:
-

Thanks [~zhuqi] for the patch, committed to trunk.

> Add Node GPU Utilization and apply to NodeMetrics.
> --
>
> Key: YARN-10692
> URL: https://issues.apache.org/jira/browse/YARN-10692
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10692.001.patch, YARN-10692.002.patch, 
> YARN-10692.003.patch
>
>
> Now there are no node level GPU Utilization, this issue will add it, and add 
> it to NodeMetrics first.
> cc [~pbacsko]  [~Jim_Brennan]  [~ebadger]  [~gandras]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10692) Add Node GPU Utilization and apply to NodeMetrics.

2021-03-18 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304078#comment-17304078
 ] 

Peter Bacsko commented on YARN-10692:
-

+1 LGTM.

Committing this soon.

> Add Node GPU Utilization and apply to NodeMetrics.
> --
>
> Key: YARN-10692
> URL: https://issues.apache.org/jira/browse/YARN-10692
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10692.001.patch, YARN-10692.002.patch, 
> YARN-10692.003.patch
>
>
> Now there are no node level GPU Utilization, this issue will add it, and add 
> it to NodeMetrics first.
> cc [~pbacsko]  [~Jim_Brennan]  [~ebadger]  [~gandras]  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10659) Improve CS MappingRule %secondary_group evaluation

2021-03-18 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304077#comment-17304077
 ] 

Szilard Nemeth commented on YARN-10659:
---

Thanks [~shuzirra] for working on this.
Latest patch LGTM, committed to trunk.
Checkstyle issue is not important and javadoc issue was not related.
Thanks [~gandras] for the review.

> Improve CS MappingRule %secondary_group evaluation
> --
>
> Key: YARN-10659
> URL: https://issues.apache.org/jira/browse/YARN-10659
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10659.001.patch, YARN-10659.002.patch, 
> YARN-10659.003.patch
>
>
> Since the leaf queue names are not unique, there are a lot of use cases where 
> %secondary_group evaluation fail, or behave inconsistently.
> We should extend it's behavior, when it's under a defined parent, 
> %secondary_group evaluation should only check for queue existence under that 
> queue. Egy root.group.%secondary_group, should only evaluate to groups which 
> exist under root.group, while the legacy %secondary_group.%user should still 
> look for groups by their leaf name globally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10659) Improve CS MappingRule %secondary_group evaluation

2021-03-18 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10659:
--
Fix Version/s: 3.4.0

> Improve CS MappingRule %secondary_group evaluation
> --
>
> Key: YARN-10659
> URL: https://issues.apache.org/jira/browse/YARN-10659
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10659.001.patch, YARN-10659.002.patch, 
> YARN-10659.003.patch
>
>
> Since the leaf queue names are not unique, there are a lot of use cases where 
> %secondary_group evaluation fail, or behave inconsistently.
> We should extend it's behavior, when it's under a defined parent, 
> %secondary_group evaluation should only check for queue existence under that 
> queue. Egy root.group.%secondary_group, should only evaluate to groups which 
> exist under root.group, while the legacy %secondary_group.%user should still 
> look for groups by their leaf name globally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10685) Fix typos in AbstractCSQueue

2021-03-18 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304041#comment-17304041
 ] 

Peter Bacsko commented on YARN-10685:
-

+1 thanks [~zhuqi] for the patch, committed to trunk.

> Fix typos in AbstractCSQueue
> 
>
> Key: YARN-10685
> URL: https://issues.apache.org/jira/browse/YARN-10685
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10685.001.patch, YARN-10685.002.patch, 
> YARN-10685.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10685) Fix typos in AbstractCSQueue

2021-03-18 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10685:

Summary: Fix typos in AbstractCSQueue  (was: Fixed some Typo  in 
AbstractCSQueue.)

> Fix typos in AbstractCSQueue
> 
>
> Key: YARN-10685
> URL: https://issues.apache.org/jira/browse/YARN-10685
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10685.001.patch, YARN-10685.002.patch, 
> YARN-10685.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.

2021-03-18 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304027#comment-17304027
 ] 

Peter Bacsko commented on YARN-10674:
-

Thanks [~zhuqi] for the patch. I think we are very close.

I still have some comments:
 1.
{noformat}
  private FSConfigToCSConfigConverterParams.
  PreemptionMode disablePreemption;
  private FSConfigToCSConfigConverterParams.
  PreemptionMode preemptionMode;
{noformat}
We don't need two enums. We need only one which covers all states (enabled / 
observeonly / nopolicy).

You can extend {{PreemptionMode}} with a new variable which says whether it's 
enabled or disabled:
{noformat}
  public enum PreemptionMode {
ENABLE("enable", true),
NO_POLICY("nopolicy", false),
OBSERVE_ONLY("observeonly", false);

private String cliOption;
private boolean enabled;

PreemptionMode(String cliOption, boolean enabled) {
  this.cliOption = cliOption;
  this.enabled = enabled;
}

public String getCliOption() {
  return cliOption;
}

public boolean isEnabled() {
  return enabled;
}
{noformat}
So you just call {{preemptionMode.isEnabled()}} and don't need two variables 
just to hold the information whether it's enabled or not.

2. {{public static PreemptionMode fromString(String cliOption)}} --> this 
method never returns ENABLED, which is important (also, pls change "ENABLE" to 
"ENABLED", note the "D" at the end).

cc [~gandras] please review patch v14.

> fs2cs: should support auto created queue deletion.
> --
>
> Key: YARN-10674
> URL: https://issues.apache.org/jira/browse/YARN-10674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Attachments: YARN-10674.001.patch, YARN-10674.002.patch, 
> YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, 
> YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, 
> YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, 
> YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch
>
>
> In FS the auto deletion check interval is 10s.
> {code:java}
> @Override
> public void onCheck() {
>   queueMgr.removeEmptyDynamicQueues();
>   queueMgr.removePendingIncompatibleQueues();
> }
> while (running) {
>   try {
> synchronized (this) {
>   reloadListener.onCheck();
> }
> ...
> Thread.sleep(reloadIntervalMs);
> }
> /** Time to wait between checks of the allocation file */
> public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10641) Refactor the max app related update, and fix maxApllications update error when add new queues.

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297081#comment-17297081
 ] 

Qi Zhu edited comment on YARN-10641 at 3/18/21, 9:57 AM:
-

[~pbacsko] [~gandras]

The logic here is not changed, the label support should be handled in 
YARN-10657.

Fixed the remaining checkstyle, if you have any other advice about this?:D

I think we should fix this Jira first, if this not fixed, max application 
without nodelabel will also be wrong.

Thanks.


was (Author: zhuqi):
[~pbacsko] [~gandras]

The logic here is not changed, the label support should be handled in 
YARN-10657.

Fixed the remaining checkstyle, if you have any other advice about this?:D

I think we should fix this Jira first, if this not fixed, max application 
without nodelabel will be wrong.

Thanks.

> Refactor the max app related update, and fix maxApllications update error 
> when add new queues.
> --
>
> Key: YARN-10641
> URL: https://issues.apache.org/jira/browse/YARN-10641
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10641.001.patch, YARN-10641.002.patch, 
> YARN-10641.003.patch, YARN-10641.004.patch, YARN-10641.005.patch, 
> YARN-10641.006.patch, image-2021-02-20-15-49-58-677.png, 
> image-2021-02-20-15-53-51-099.png, image-2021-02-20-15-55-44-780.png, 
> image-2021-02-20-16-29-18-519.png, image-2021-02-20-16-31-13-714.png
>
>
> When refactor the update logic in YARN-10504 .
> The update max applications based abs/cap is wrong, this should be fixed, 
> because the max applications is key part to limit applications in CS.
> For example: 
> When adding a dynamic queue, the other children's max app of parent queue are 
> not updated correctly:
> !image-2021-02-20-15-53-51-099.png|width=639,height=509!  
> The new added queue's max app will updated correctly:
> !image-2021-02-20-15-55-44-780.png|width=542,height=426!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10641) Refactor the max app related update, and fix maxApllications update error when add new queues.

2021-03-18 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297081#comment-17297081
 ] 

Qi Zhu edited comment on YARN-10641 at 3/18/21, 9:57 AM:
-

[~pbacsko] [~gandras]

The logic here is not changed, the label support should be handled in 
YARN-10657.

Fixed the remaining checkstyle, if you have any other advice about this?:D

I think we should fix this Jira first, if this not fixed, max application 
without nodelabel will be wrong.

Thanks.


was (Author: zhuqi):
[~pbacsko]

The logic here is not changed, the label support should be handled in 
YARN-10657.

Fixed the remaining checkstyle, if you have any other advice about this?:D

Thanks.

> Refactor the max app related update, and fix maxApllications update error 
> when add new queues.
> --
>
> Key: YARN-10641
> URL: https://issues.apache.org/jira/browse/YARN-10641
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Attachments: YARN-10641.001.patch, YARN-10641.002.patch, 
> YARN-10641.003.patch, YARN-10641.004.patch, YARN-10641.005.patch, 
> YARN-10641.006.patch, image-2021-02-20-15-49-58-677.png, 
> image-2021-02-20-15-53-51-099.png, image-2021-02-20-15-55-44-780.png, 
> image-2021-02-20-16-29-18-519.png, image-2021-02-20-16-31-13-714.png
>
>
> When refactor the update logic in YARN-10504 .
> The update max applications based abs/cap is wrong, this should be fixed, 
> because the max applications is key part to limit applications in CS.
> For example: 
> When adding a dynamic queue, the other children's max app of parent queue are 
> not updated correctly:
> !image-2021-02-20-15-53-51-099.png|width=639,height=509!  
> The new added queue's max app will updated correctly:
> !image-2021-02-20-15-55-44-780.png|width=542,height=426!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10659) Improve CS MappingRule %secondary_group evaluation

2021-03-18 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303996#comment-17303996
 ] 

Andras Gyori commented on YARN-10659:
-

Thanks [~shuzirra], the patch looks good to me now +1 non binding. If no other 
revisions are expected, [~snemeth] could review it and commit to trunk.

> Improve CS MappingRule %secondary_group evaluation
> --
>
> Key: YARN-10659
> URL: https://issues.apache.org/jira/browse/YARN-10659
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10659.001.patch, YARN-10659.002.patch, 
> YARN-10659.003.patch
>
>
> Since the leaf queue names are not unique, there are a lot of use cases where 
> %secondary_group evaluation fail, or behave inconsistently.
> We should extend it's behavior, when it's under a defined parent, 
> %secondary_group evaluation should only check for queue existence under that 
> queue. Egy root.group.%secondary_group, should only evaluate to groups which 
> exist under root.group, while the legacy %secondary_group.%user should still 
> look for groups by their leaf name globally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org