[jira] [Updated] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-11-30 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-11018:
---
Attachment: YARN-11018.001.patch

> RM rest api show error resources in capacity scheduler with nodelabels
> --
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
> Attachments: YARN-11018.001.patch
>
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB, totalVirtualCores are error in capacity scheduler 
> with nodelabels. 
> When we get cluster metrics use 'curl 
> [http://rm:8088/ws/v1/cluster/metrics',] we get error totalMB and 
> totalVirtualCores.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-11-30 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-11018:
---
Attachment: (was: YARN-11018.001.patch)

> RM rest api show error resources in capacity scheduler with nodelabels
> --
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
> Attachments: YARN-11018.001.patch
>
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB, totalVirtualCores are error in capacity scheduler 
> with nodelabels. 
> When we get cluster metrics use 'curl 
> [http://rm:8088/ws/v1/cluster/metrics',] we get error totalMB and 
> totalVirtualCores.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11026) Make default AppPlacementAllocator configurable

2021-11-30 Thread Minni Mittal (Jira)
Minni Mittal created YARN-11026:
---

 Summary: Make default AppPlacementAllocator configurable
 Key: YARN-11026
 URL: https://issues.apache.org/jira/browse/YARN-11026
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-11-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451547#comment-17451547
 ] 

Hadoop QA commented on YARN-11018:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
51s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 43s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
13s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
55s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1250/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 34s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | 

[jira] [Updated] (YARN-10863) CGroupElasticMemoryController is not work

2021-11-30 Thread LuoGe (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LuoGe updated YARN-10863:
-
Attachment: YARN-10863.004.patch

> CGroupElasticMemoryController is not work
> -
>
> Key: YARN-10863
> URL: https://issues.apache.org/jira/browse/YARN-10863
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.3.1
>Reporter: LuoGe
>Priority: Major
> Attachments: YARN-10863.001-1.patch, YARN-10863.002.patch, 
> YARN-10863.004.patch
>
>
> When following the 
> [documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCGroupsMemory.html]
>  configuring elastic memory resource control, 
> yarn.nodemanager.elastic-memory-control.enabled set true,  
> yarn.nodemanager.resource.memory.enforced set to false, 
> yarn.nodemanager.pmem-check-enabled set true, and 
> yarn.nodemanager.resource.memory.enabled set true to use cgroup control 
> memory, but elastic memory control is not work.
> I see the code ContainersMonitorImpl.java, in checkLimit function, the skip 
> logic have some problem.  The return condition is strictMemoryEnforcement is 
> true and elasticMemoryEnforcement is false. So, following the document set 
> use elastic memory control, the check logic will continue, when container 
> memory used over limit will killed by checkLimit. 
> {code:java}
> if (strictMemoryEnforcement && !elasticMemoryEnforcement) {
>   // When cgroup-based strict memory enforcement is used alone without
>   // elastic memory control, the oom-kill would take care of it.
>   // However, when elastic memory control is also enabled, the oom killer
>   // would be disabled at the root yarn container cgroup level (all child
>   // cgroups would inherit that setting). Hence, we fall back to the
>   // polling-based mechanism.
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10863) CGroupElasticMemoryController is not work

2021-11-30 Thread LuoGe (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LuoGe updated YARN-10863:
-
Attachment: (was: YARN-10863.003.patch)

> CGroupElasticMemoryController is not work
> -
>
> Key: YARN-10863
> URL: https://issues.apache.org/jira/browse/YARN-10863
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.3.1
>Reporter: LuoGe
>Priority: Major
> Attachments: YARN-10863.001-1.patch, YARN-10863.002.patch
>
>
> When following the 
> [documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCGroupsMemory.html]
>  configuring elastic memory resource control, 
> yarn.nodemanager.elastic-memory-control.enabled set true,  
> yarn.nodemanager.resource.memory.enforced set to false, 
> yarn.nodemanager.pmem-check-enabled set true, and 
> yarn.nodemanager.resource.memory.enabled set true to use cgroup control 
> memory, but elastic memory control is not work.
> I see the code ContainersMonitorImpl.java, in checkLimit function, the skip 
> logic have some problem.  The return condition is strictMemoryEnforcement is 
> true and elasticMemoryEnforcement is false. So, following the document set 
> use elastic memory control, the check logic will continue, when container 
> memory used over limit will killed by checkLimit. 
> {code:java}
> if (strictMemoryEnforcement && !elasticMemoryEnforcement) {
>   // When cgroup-based strict memory enforcement is used alone without
>   // elastic memory control, the oom-kill would take care of it.
>   // However, when elastic memory control is also enabled, the oom killer
>   // would be disabled at the root yarn container cgroup level (all child
>   // cgroups would inherit that setting). Hence, we fall back to the
>   // polling-based mechanism.
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11025) Implement distributed decommissioning

2021-11-30 Thread Minni Mittal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11025:

Summary: Implement distributed decommissioning  (was: Implement distributed 
maintenance )

> Implement distributed decommissioning
> -
>
> Key: YARN-11025
> URL: https://issues.apache.org/jira/browse/YARN-11025
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11025) Implement distributed maintenance

2021-11-30 Thread Minni Mittal (Jira)
Minni Mittal created YARN-11025:
---

 Summary: Implement distributed maintenance 
 Key: YARN-11025
 URL: https://issues.apache.org/jira/browse/YARN-11025
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10863) CGroupElasticMemoryController is not work

2021-11-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451491#comment-17451491
 ] 

Hadoop QA commented on YARN-10863:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
14s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
28s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
31s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 22s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 18m  
0s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
35s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1249/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt{color}
 | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
23s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1249/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt{color}
 | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 23s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1249/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt{color}
 | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
19s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1249/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt{color}
 | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 19s{color} 
| 

[jira] [Updated] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-11-30 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-11018:
---
Attachment: YARN-11018.001.patch

> RM rest api show error resources in capacity scheduler with nodelabels
> --
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
> Attachments: YARN-11018.001.patch
>
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB, totalVirtualCores are error in capacity scheduler 
> with nodelabels. 
> When we get cluster metrics use 'curl 
> [http://rm:8088/ws/v1/cluster/metrics',] we get error totalMB and 
> totalVirtualCores.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-11-30 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-11018:
---
Attachment: (was: YARN-11018.001.patch)

> RM rest api show error resources in capacity scheduler with nodelabels
> --
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB, totalVirtualCores are error in capacity scheduler 
> with nodelabels. 
> When we get cluster metrics use 'curl 
> [http://rm:8088/ws/v1/cluster/metrics',] we get error totalMB and 
> totalVirtualCores.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10863) CGroupElasticMemoryController is not work

2021-11-30 Thread LuoGe (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LuoGe updated YARN-10863:
-
Attachment: YARN-10863.003.patch

> CGroupElasticMemoryController is not work
> -
>
> Key: YARN-10863
> URL: https://issues.apache.org/jira/browse/YARN-10863
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.3.1
>Reporter: LuoGe
>Priority: Major
> Attachments: YARN-10863.001-1.patch, YARN-10863.002.patch, 
> YARN-10863.003.patch
>
>
> When following the 
> [documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCGroupsMemory.html]
>  configuring elastic memory resource control, 
> yarn.nodemanager.elastic-memory-control.enabled set true,  
> yarn.nodemanager.resource.memory.enforced set to false, 
> yarn.nodemanager.pmem-check-enabled set true, and 
> yarn.nodemanager.resource.memory.enabled set true to use cgroup control 
> memory, but elastic memory control is not work.
> I see the code ContainersMonitorImpl.java, in checkLimit function, the skip 
> logic have some problem.  The return condition is strictMemoryEnforcement is 
> true and elasticMemoryEnforcement is false. So, following the document set 
> use elastic memory control, the check logic will continue, when container 
> memory used over limit will killed by checkLimit. 
> {code:java}
> if (strictMemoryEnforcement && !elasticMemoryEnforcement) {
>   // When cgroup-based strict memory enforcement is used alone without
>   // elastic memory control, the oom-kill would take care of it.
>   // However, when elastic memory control is also enabled, the oom killer
>   // would be disabled at the root yarn container cgroup level (all child
>   // cgroups would inherit that setting). Hence, we fall back to the
>   // polling-based mechanism.
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-11-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451294#comment-17451294
 ] 

Hadoop QA commented on YARN-11018:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
52s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
24s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 17s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 
45s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m  
1s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 38s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1248/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 27 new + 2 unchanged - 0 fixed = 29 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 51s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | 

[jira] [Updated] (YARN-11024) Create an AbstractLeafQueue to store the common LeafQueue + AutoCreatedLeafQueue functionality

2021-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11024:
--
Labels: pull-request-available  (was: )

> Create an AbstractLeafQueue to store the common LeafQueue + 
> AutoCreatedLeafQueue functionality
> --
>
> Key: YARN-11024
> URL: https://issues.apache.org/jira/browse/YARN-11024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> AbstractAutoCreatedLeafQueue extends the LeafQueue class which is an 
> instantiable class, so every time an AutoCreatedLeafQueue is created a normal 
> LeafQueue is configured as well. This setup results in some strange behaviour 
> like having to pass the template configs of an auto created queue to a leaf 
> queue. To make the whole structure more flexible an AbstractLeafQueue should 
> be created which stores the common methods.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11024) Create an AbstractLeafQueue to store the common LeafQueue + AutoCreatedLeafQueue functionality

2021-11-30 Thread Benjamin Teke (Jira)
Benjamin Teke created YARN-11024:


 Summary: Create an AbstractLeafQueue to store the common LeafQueue 
+ AutoCreatedLeafQueue functionality
 Key: YARN-11024
 URL: https://issues.apache.org/jira/browse/YARN-11024
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Benjamin Teke
Assignee: Benjamin Teke


AbstractAutoCreatedLeafQueue extends the LeafQueue class which is an 
instantiable class, so every time an AutoCreatedLeafQueue is created a normal 
LeafQueue is configured as well. This setup results in some strange behaviour 
like having to pass the template configs of an auto created queue to a leaf 
queue. To make the whole structure more flexible an AbstractLeafQueue should be 
created which stores the common methods.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11023) Extend the root QueueInfo with max-parallel-apps in CapacityScheduler

2021-11-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11023:
--
Labels: pull-request-available  (was: )

> Extend the root QueueInfo with max-parallel-apps in CapacityScheduler
> -
>
> Key: YARN-11023
> URL: https://issues.apache.org/jira/browse/YARN-11023
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> YARN-10891 extended the QueueInfo with the maxParallelApps property, but for 
> the root queue this property is missing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-11-30 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-11018:
---
Attachment: YARN-11018.001.patch

> RM rest api show error resources in capacity scheduler with nodelabels
> --
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
> Attachments: YARN-11018.001.patch
>
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB, totalVirtualCores are error in capacity scheduler 
> with nodelabels. 
> When we get cluster metrics use 'curl 
> [http://rm:8088/ws/v1/cluster/metrics',] we get error totalMB and 
> totalVirtualCores.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-11-30 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-11018:
---
Attachment: (was: YARN-11018.001.patch)

> RM rest api show error resources in capacity scheduler with nodelabels
> --
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB, totalVirtualCores are error in capacity scheduler 
> with nodelabels. 
> When we get cluster metrics use 'curl 
> [http://rm:8088/ws/v1/cluster/metrics',] we get error totalMB and 
> totalVirtualCores.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-11-30 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-11018:
---
Description: 
Because resource metrics updated only for "default" partition, allocatedMB, 
allocatedVCores, totalMB, totalVirtualCores are error in capacity scheduler 
with nodelabels. 

When we get cluster metrics use 'curl [http://rm:8088/ws/v1/cluster/metrics',] 
we get error totalMB and totalVirtualCores.

It should use resources across partition to replace.

  was:
Because resource metrics updated only for "default" partition, allocatedMB, 
allocatedVCores, totalMB and other resources are error in capacity scheduler 
with nodelabels. And in RM UI 1, the cluster metrics is uncorrect.

We can see the memory total and vcores total is unequal to resources in the all 
partitions.

It should use resources across partition to replace.


> RM rest api show error resources in capacity scheduler with nodelabels
> --
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB, totalVirtualCores are error in capacity scheduler 
> with nodelabels. 
> When we get cluster metrics use 'curl 
> [http://rm:8088/ws/v1/cluster/metrics',] we get error totalMB and 
> totalVirtualCores.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11018) RM rest api show error resources in capacity scheduler with nodelabels

2021-11-30 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-11018:
---
Summary: RM rest api show error resources in capacity scheduler with 
nodelabels  (was: RM UI 1 show error resources in capacity scheduler with multi 
nodelabels)

> RM rest api show error resources in capacity scheduler with nodelabels
> --
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
> Attachments: YARN-11018.001.patch
>
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB and other resources are error in capacity scheduler 
> with nodelabels. And in RM UI 1, the cluster metrics is uncorrect.
> We can see the memory total and vcores total is unequal to resources in the 
> all partitions.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11022) Fix the documentation for max-parallel-apps in CS

2021-11-30 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-11022:
--
Description: 
The documentation does not mention that the max-parallel-apps property is 
inherited. The property can be overridden on a per queue basis, but the 
parent(s) can also restrict how many parallel apps can be run.

 

{*}yarn.scheduler.capacity.max-parallel-apps / 
yarn.scheduler.capacity..max-parallel-apps{*}: 
{quote}
Maximum number of applications that can run at the same time. Unlike to 
maximum-applications, application submissions are not rejected when this limit 
is reached. Instead they stay in ACCEPTED state until they are eligible to run. 
This can be set for all queues with yarn.scheduler.capacity.max-parallel-apps 
and can also be overridden on a per queue basis by setting 
yarn.scheduler.capacity..max-parallel-apps. Integer value is 
expected. By default, there is no limit.
{quote}
 

[https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSMaxRunningAppsEnforcer.java#L99]
{code}
  private boolean exceedQueueMaxParallelApps(AbstractCSQueue queue) {
    // Check queue and all parent queues    while (queue != null) {
      if (queue.getNumRunnableApps() >= queue.getMaxParallelApps()) {
        LOG.info("Maximum runnable apps exceeded for queue {}",
            queue.getQueuePath());
        return true;
      }
      queue = (AbstractCSQueue) queue.getParent();
    }    return false;
  } 
{code}

Example:

Let's say the user configured the *yarn.scheduler.capacity.max-parallel-apps* 
to 250, that will be the default for queues that doesn't override the setting. 
([https://github.com/apache/hadoop/blob/32ecaed9c3c06a48ef01d0437e62e8faccd3e9f3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java#L1688])

Given this queue hierarchy:
||root||.a||.a1||.a2||.a3||.a4||
|500|default|50|10|default|15|
||root||.a||.b||
|500|default|50|
 - maximum 250 apps can run parallel under the *root.a* queues.
 - maximum 50 apps can run parallel under the *root.a.a1* queues.
 - maximum 10 apps can run parallel under the *root.a.a1.a2* queues.
 - maximum *10* apps can run parallel under the *root.a1.a2.a3* queues. (even 
though the max-parallel-apps is not set for .a3 so the default 250 applies for 
that queue, but it's parent had a lower value, and children can't exceed that)
 - maximum *10* apps can run parallel under the *root.a1.a2.a3.a4* queue. (even 
though it's configured for 15, the parents restrict this limit to 10)
 - maximum 50 apps can run parallel under the *root.a.b* queue.

  was:
The documentation does not mention that the max-parallel-apps property is 
inherited. The property can be overridden on a per queue basis, but the 
parent(s) can also restrict how many parallel apps can be run.

 

{*}yarn.scheduler.capacity.max-parallel-apps / 
yarn.scheduler.capacity..max-parallel-apps{*}: Maximum number of 
applications that can run at the same time. Unlike to maximum-applications, 
application submissions are not rejected when this limit is reached. Instead 
they stay in ACCEPTED state until they are eligible to run. This can be set for 
all queues with yarn.scheduler.capacity.max-parallel-apps and can also be 
overridden on a per queue basis by setting 
yarn.scheduler.capacity..max-parallel-apps. Integer value is 
expected. By default, there is no limit.

 

[https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSMaxRunningAppsEnforcer.java#L99]
  private boolean exceedQueueMaxParallelApps(AbstractCSQueue queue) {
    // Check queue and all parent queues    while (queue != null) {
      if (queue.getNumRunnableApps() >= queue.getMaxParallelApps()) {
        LOG.info("Maximum runnable apps exceeded for queue {}",
            queue.getQueuePath());
        return true;
      }
      queue = (AbstractCSQueue) queue.getParent();
    }    return false;
  } 

Example:



Let's say the user configured the *yarn.scheduler.capacity.max-parallel-apps* 
to 250, that will be the default for queues that doesn't override the setting. 
([https://github.com/apache/hadoop/blob/32ecaed9c3c06a48ef01d0437e62e8faccd3e9f3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java#L1688])

Given this queue hierarchy:
||root||.a||.a1||.a2||.a3||.a4||

[jira] [Updated] (YARN-11022) Fix the documentation for max-parallel-apps in CS

2021-11-30 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-11022:
--
Description: 
The documentation does not mention that the max-parallel-apps property is 
inherited. The property can be overridden on a per queue basis, but the 
parent(s) can also restrict how many parallel apps can be run.

 

{*}yarn.scheduler.capacity.max-parallel-apps / 
yarn.scheduler.capacity..max-parallel-apps{*}: 

{quote}
Maximum number of applications that can run at the same time. Unlike to 
maximum-applications, application submissions are not rejected when this limit 
is reached. Instead they stay in ACCEPTED state until they are eligible to run. 
This can be set for all queues with yarn.scheduler.capacity.max-parallel-apps 
and can also be overridden on a per queue basis by setting 
yarn.scheduler.capacity..max-parallel-apps. Integer value is 
expected. By default, there is no limit.
{quote}
 

[https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSMaxRunningAppsEnforcer.java#L99]
{code}
  private boolean exceedQueueMaxParallelApps(AbstractCSQueue queue) {
    // Check queue and all parent queues    while (queue != null) {
      if (queue.getNumRunnableApps() >= queue.getMaxParallelApps()) {
        LOG.info("Maximum runnable apps exceeded for queue {}",
            queue.getQueuePath());
        return true;
      }
      queue = (AbstractCSQueue) queue.getParent();
    }    return false;
  } 
{code}

Example:

Let's say the user configured the *yarn.scheduler.capacity.max-parallel-apps* 
to 250, that will be the default for queues that doesn't override the setting. 
([https://github.com/apache/hadoop/blob/32ecaed9c3c06a48ef01d0437e62e8faccd3e9f3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java#L1688])

Given this queue hierarchy:
||root||.a||.a1||.a2||.a3||.a4||
|500|default|50|10|default|15|
||root||.a||.b||
|500|default|50|
 - maximum 250 apps can run parallel under the *root.a* queues.
 - maximum 50 apps can run parallel under the *root.a.a1* queues.
 - maximum 10 apps can run parallel under the *root.a.a1.a2* queues.
 - maximum *10* apps can run parallel under the *root.a1.a2.a3* queues. (even 
though the max-parallel-apps is not set for .a3 so the default 250 applies for 
that queue, but it's parent had a lower value, and children can't exceed that)
 - maximum *10* apps can run parallel under the *root.a1.a2.a3.a4* queue. (even 
though it's configured for 15, the parents restrict this limit to 10)
 - maximum 50 apps can run parallel under the *root.a.b* queue.

  was:
The documentation does not mention that the max-parallel-apps property is 
inherited. The property can be overridden on a per queue basis, but the 
parent(s) can also restrict how many parallel apps can be run.

 

{*}yarn.scheduler.capacity.max-parallel-apps / 
yarn.scheduler.capacity..max-parallel-apps{*}: 
{quote}
Maximum number of applications that can run at the same time. Unlike to 
maximum-applications, application submissions are not rejected when this limit 
is reached. Instead they stay in ACCEPTED state until they are eligible to run. 
This can be set for all queues with yarn.scheduler.capacity.max-parallel-apps 
and can also be overridden on a per queue basis by setting 
yarn.scheduler.capacity..max-parallel-apps. Integer value is 
expected. By default, there is no limit.
{quote}
 

[https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSMaxRunningAppsEnforcer.java#L99]
{code}
  private boolean exceedQueueMaxParallelApps(AbstractCSQueue queue) {
    // Check queue and all parent queues    while (queue != null) {
      if (queue.getNumRunnableApps() >= queue.getMaxParallelApps()) {
        LOG.info("Maximum runnable apps exceeded for queue {}",
            queue.getQueuePath());
        return true;
      }
      queue = (AbstractCSQueue) queue.getParent();
    }    return false;
  } 
{code}

Example:

Let's say the user configured the *yarn.scheduler.capacity.max-parallel-apps* 
to 250, that will be the default for queues that doesn't override the setting. 
([https://github.com/apache/hadoop/blob/32ecaed9c3c06a48ef01d0437e62e8faccd3e9f3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java#L1688])

Given this queue hierarchy:

[jira] [Created] (YARN-11023) Extend the root QueueInfo with max-parallel-apps in CapacityScheduler

2021-11-30 Thread Tamas Domok (Jira)
Tamas Domok created YARN-11023:
--

 Summary: Extend the root QueueInfo with max-parallel-apps in 
CapacityScheduler
 Key: YARN-11023
 URL: https://issues.apache.org/jira/browse/YARN-11023
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 3.4.0
Reporter: Tamas Domok
Assignee: Tamas Domok


YARN-10891 extended the QueueInfo with the maxParallelApps property, but for 
the root queue this property is missing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11022) Fix the documentation for max-parallel-apps in CS

2021-11-30 Thread Tamas Domok (Jira)
Tamas Domok created YARN-11022:
--

 Summary: Fix the documentation for max-parallel-apps in CS
 Key: YARN-11022
 URL: https://issues.apache.org/jira/browse/YARN-11022
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 3.4.0
Reporter: Tamas Domok
Assignee: Tamas Domok


The documentation does not mention that the max-parallel-apps property is 
inherited. The property can be overridden on a per queue basis, but the 
parent(s) can also restrict how many parallel apps can be run.

 

{*}yarn.scheduler.capacity.max-parallel-apps / 
yarn.scheduler.capacity..max-parallel-apps{*}: Maximum number of 
applications that can run at the same time. Unlike to maximum-applications, 
application submissions are not rejected when this limit is reached. Instead 
they stay in ACCEPTED state until they are eligible to run. This can be set for 
all queues with yarn.scheduler.capacity.max-parallel-apps and can also be 
overridden on a per queue basis by setting 
yarn.scheduler.capacity..max-parallel-apps. Integer value is 
expected. By default, there is no limit.

 

[https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSMaxRunningAppsEnforcer.java#L99]
  private boolean exceedQueueMaxParallelApps(AbstractCSQueue queue) {
    // Check queue and all parent queues    while (queue != null) {
      if (queue.getNumRunnableApps() >= queue.getMaxParallelApps()) {
        LOG.info("Maximum runnable apps exceeded for queue {}",
            queue.getQueuePath());
        return true;
      }
      queue = (AbstractCSQueue) queue.getParent();
    }    return false;
  } 

Example:



Let's say the user configured the *yarn.scheduler.capacity.max-parallel-apps* 
to 250, that will be the default for queues that doesn't override the setting. 
([https://github.com/apache/hadoop/blob/32ecaed9c3c06a48ef01d0437e62e8faccd3e9f3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java#L1688])

Given this queue hierarchy:
||root||.a||.a1||.a2||.a3||.a4||
|500|default|50|10|default|15|
||root||.a||.b||
|500|default|50|
 - maximum 250 apps can run parallel under the *root.a* queues.
 - maximum 50 apps can run parallel under the *root.a.a1* queues.
 - maximum 10 apps can run parallel under the *root.a.a1.a2* queues.
 - maximum *10* apps can run parallel under the *root.a1.a2.a3* queues. (even 
though the max-parallel-apps is not set for .a3 so the default 250 applies for 
that queue, but it's parent had a lower value, and children can't exceed that)
 - maximum *10* apps can run parallel under the *root.a1.a2.a3.a4* queue. (even 
though it's configured for 15, the parents restrict this limit to 10)
 - maximum 50 apps can run parallel under the *root.a.b* queue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11020) [UI2] No container is found for an application attempt with a single AM container

2021-11-30 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451027#comment-17451027
 ] 

Andras Gyori commented on YARN-11020:
-

Thanks [~adam.antal] for chiming in. I agree with you that it is a bug on YARN 
RM services side, as making a distinction between multiple and single entries 
in responses is a really bad practice. However, there is a slight, but non-zero 
possibility, that someone is already using this endpoint aside from UI2.

> [UI2] No container is found for an application attempt with a single AM 
> container
> -
>
> Key: YARN-11020
> URL: https://issues.apache.org/jira/browse/YARN-11020
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In UI2 for an application under the Logs tab, No container data available 
> message is shown if the application attempt only submitted a single container 
> (which is the AM container). 
> The culprit of the issue is that the response from YARN is not consistent, 
> because for a single container it looks like:
> {noformat}
> {
>     "containerLogsInfo": {
>         "containerLogInfo": [
>             {
>                 "fileName": "prelaunch.out",
>                 "fileSize": "100",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "directory.info",
>                 "fileSize": "2296",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "stderr",
>                 "fileSize": "1722",
>                 "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021"
>             },
>             {
>                 "fileName": "prelaunch.err",
>                 "fileSize": "0",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "stdout",
>                 "fileSize": "0",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             },
>             {
>                 "fileName": "syslog",
>                 "fileSize": "38551",
>                 "lastModifiedTime": "Mon Nov 29 09:28:28 + 2021"
>             },
>             {
>                 "fileName": "launch_container.sh",
>                 "fileSize": "5013",
>                 "lastModifiedTime": "Mon Nov 29 09:28:16 + 2021"
>             }
>         ],
>         "logAggregationType": "AGGREGATED",
>         "containerId": "container_1638174027957_0008_01_01",
>         "nodeId": "da175178c179:43977"
>     }
> }{noformat}
> As for applications with multiple containers it looks like:
> {noformat}
> {
>     "containerLogsInfo": [{
>         
>     }, {  }]
> }{noformat}
> We can not change the response of the endpoint due to backward compatibility, 
> therefore we need to make UI2 be able to handle both scenarios.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10863) CGroupElasticMemoryController is not work

2021-11-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451026#comment-17451026
 ] 

Hadoop QA commented on YARN-10863:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
49s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
27s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
29s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
24s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 37s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 18m 
22s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
30s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
28s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1247/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 2 new + 1 unchanged - 0 fixed = 3 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} shellcheck {color} | {color:red}  0m  
0s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1247/artifact/out/diff-patch-shellcheck.txt{color}
 | {color:red} The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 
0) {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
18s{color} | {color:green}{color} | {color:green} 

[jira] [Updated] (YARN-10863) CGroupElasticMemoryController is not work

2021-11-30 Thread LuoGe (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LuoGe updated YARN-10863:
-
Attachment: YARN-10863.002.patch

> CGroupElasticMemoryController is not work
> -
>
> Key: YARN-10863
> URL: https://issues.apache.org/jira/browse/YARN-10863
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.3.1
>Reporter: LuoGe
>Priority: Major
> Attachments: YARN-10863.001-1.patch, YARN-10863.002.patch
>
>
> When following the 
> [documentation|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerCGroupsMemory.html]
>  configuring elastic memory resource control, 
> yarn.nodemanager.elastic-memory-control.enabled set true,  
> yarn.nodemanager.resource.memory.enforced set to false, 
> yarn.nodemanager.pmem-check-enabled set true, and 
> yarn.nodemanager.resource.memory.enabled set true to use cgroup control 
> memory, but elastic memory control is not work.
> I see the code ContainersMonitorImpl.java, in checkLimit function, the skip 
> logic have some problem.  The return condition is strictMemoryEnforcement is 
> true and elasticMemoryEnforcement is false. So, following the document set 
> use elastic memory control, the check logic will continue, when container 
> memory used over limit will killed by checkLimit. 
> {code:java}
> if (strictMemoryEnforcement && !elasticMemoryEnforcement) {
>   // When cgroup-based strict memory enforcement is used alone without
>   // elastic memory control, the oom-kill would take care of it.
>   // However, when elastic memory control is also enabled, the oom killer
>   // would be disabled at the root yarn container cgroup level (all child
>   // cgroups would inherit that setting). Hence, we fall back to the
>   // polling-based mechanism.
>   return;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11018) RM UI 1 show error resources in capacity scheduler with multi nodelabels

2021-11-30 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-11018:
---
Environment: (was: !resource1.png|width=774,height=185!)

> RM UI 1 show error resources in capacity scheduler with multi nodelabels
> 
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
> Attachments: YARN-11018.001.patch
>
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB and other resources are error in capacity scheduler 
> with nodelabels. And in RM UI 1, the cluster metrics is uncorrect.
> We can see the memory total and vcores total is unequal to resources in the 
> all partitions.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] (YARN-11018) RM UI 1 show error resources in capacity scheduler with multi nodelabels

2021-11-30 Thread caozhiqiang (Jira)


[ https://issues.apache.org/jira/browse/YARN-11018 ]


caozhiqiang deleted comment on YARN-11018:


was (Author: caozhiqiang):
It is related to [YARN-10343|https://issues.apache.org/jira/browse/YARN-10343], 
so is cancel this patch.

> RM UI 1 show error resources in capacity scheduler with multi nodelabels
> 
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
> Environment: !resource1.png|width=774,height=185!
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
> Attachments: YARN-11018.001.patch
>
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB and other resources are error in capacity scheduler 
> with nodelabels. And in RM UI 1, the cluster metrics is uncorrect.
> We can see the memory total and vcores total is unequal to resources in the 
> all partitions.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11018) RM UI 1 show error resources in capacity scheduler with multi nodelabels

2021-11-30 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-11018:
---
Attachment: (was: resource1.png)

> RM UI 1 show error resources in capacity scheduler with multi nodelabels
> 
>
> Key: YARN-11018
> URL: https://issues.apache.org/jira/browse/YARN-11018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.4.0
> Environment: !resource1.png|width=774,height=185!
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
> Attachments: YARN-11018.001.patch
>
>
> Because resource metrics updated only for "default" partition, allocatedMB, 
> allocatedVCores, totalMB and other resources are error in capacity scheduler 
> with nodelabels. And in RM UI 1, the cluster metrics is uncorrect.
> We can see the memory total and vcores total is unequal to resources in the 
> all partitions.
> It should use resources across partition to replace.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org