[jira] [Commented] (YARN-8842) Expose metrics for custom resource types in QueueMetrics

2019-11-14 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974877#comment-16974877
 ] 

Jonathan Hung commented on YARN-8842:
-

Unit test failure is related to YARN-7589, I'll port that after this Jira goes 
into branch-2.

[^YARN-8842-branch-2.002.patch] fixes the javac/javadoc/compile error.

> Expose metrics for custom resource types in QueueMetrics
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-8842-branch-2.001.patch, 
> YARN-8842-branch-2.002.patch, YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch, 
> YARN-8842.006.patch, YARN-8842.007.patch, YARN-8842.008.patch, 
> YARN-8842.009.patch, YARN-8842.010.patch, YARN-8842.011.patch, 
> YARN-8842.012.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8842) Expose metrics for custom resource types in QueueMetrics

2019-11-14 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8842:

Attachment: YARN-8842-branch-2.002.patch

> Expose metrics for custom resource types in QueueMetrics
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-8842-branch-2.001.patch, 
> YARN-8842-branch-2.002.patch, YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch, 
> YARN-8842.006.patch, YARN-8842.007.patch, YARN-8842.008.patch, 
> YARN-8842.009.patch, YARN-8842.010.patch, YARN-8842.011.patch, 
> YARN-8842.012.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9982) Fix Container API example link in NodeManager REST API doc

2019-11-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974814#comment-16974814
 ] 

Hadoop QA commented on YARN-9982:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
33m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9982 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985908/YARN-9982.001.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux 6afa910532ed 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 92c28c1 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 307 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25175/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Fix Container API example link in NodeManager REST API doc
> --
>
> Key: YARN-9982
> URL: https://issues.apache.org/jira/browse/YARN-9982
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Charan Hebri
>Assignee: Charan Hebri
>Priority: Trivial
> Attachments: YARN-9982.001.patch
>
>
> In 
> [https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html#Container_API]
> {noformat}
> GET 
> http://nm-http-address:port/ws/v1/nodes/containers/container_1326121700862_0007_01_01{noformat}
> should be changed to,
> {noformat}
> GET 
> http://nm-http-address:port/ws/v1/node/containers/container_1326121700862_0007_01_01{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8842) Expose metrics for custom resource types in QueueMetrics

2019-11-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974807#comment-16974807
 ] 

Hadoop QA commented on YARN-8842:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 31m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
34s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
50s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
44s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
33s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
58s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
26s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  2m 
16s{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  2m 16s{color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
43s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 52s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 32 new + 139 unchanged - 2 fixed = 171 total (was 141) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
34s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95
 with JDK v1.7.0_95 generated 1 new + 4 unchanged - 0 fixed = 5 total (was 4) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m  4s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 65m  
3s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 53s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.api.TestPBImplRecords |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 

[jira] [Updated] (YARN-9982) Fix Container API example link in NodeManager REST API doc

2019-11-14 Thread Charan Hebri (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charan Hebri updated YARN-9982:
---
Attachment: YARN-9982.001.patch

> Fix Container API example link in NodeManager REST API doc
> --
>
> Key: YARN-9982
> URL: https://issues.apache.org/jira/browse/YARN-9982
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Charan Hebri
>Assignee: Charan Hebri
>Priority: Trivial
> Attachments: YARN-9982.001.patch
>
>
> In 
> [https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html#Container_API]
> {noformat}
> GET 
> http://nm-http-address:port/ws/v1/nodes/containers/container_1326121700862_0007_01_01{noformat}
> should be changed to,
> {noformat}
> GET 
> http://nm-http-address:port/ws/v1/node/containers/container_1326121700862_0007_01_01{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9982) Fix Container API example link in NodeManager REST API doc

2019-11-14 Thread Charan Hebri (Jira)
Charan Hebri created YARN-9982:
--

 Summary: Fix Container API example link in NodeManager REST API doc
 Key: YARN-9982
 URL: https://issues.apache.org/jira/browse/YARN-9982
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Reporter: Charan Hebri
Assignee: Charan Hebri


In 
[https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html#Container_API]
{noformat}
GET 
http://nm-http-address:port/ws/v1/nodes/containers/container_1326121700862_0007_01_01{noformat}
should be changed to,
{noformat}
GET 
http://nm-http-address:port/ws/v1/node/containers/container_1326121700862_0007_01_01{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8842) Expose metrics for custom resource types in QueueMetrics

2019-11-14 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974747#comment-16974747
 ] 

Jonathan Hung commented on YARN-8842:
-

I've attached [^YARN-8842-branch-2.001.patch]. Main differences are :
 * Replacing {{expected}} resource maps with manually constructed ones, e.g: 
{{computeExpectedCustomResourceValues(testData.customResourceValues, (k, v) -> 
v * containers}} becomes

{noformat}
+    Map expected = new HashMap<>();
+    for (Map.Entry entry : 
testData.customResourceValues.entrySet()) {
+      expected.put(entry.getKey(), entry.getValue() * containers);
+    }{noformat}
 * Duplicating all of the \{{assertAllMetrics}}, {{assertQueueMetricsOnly}}, 
{{assertAllQueueMetrics}}, \{{checkAllQueueMetrics}} four times, each one looks 
like \{{assertAllXMetrics}}, {{assertQueueXMetricsOnly}}, 
{{assertAllXQueueMetrics}}, \{{checkAllXQueueMetrics}} where X = \{Pending, 
Allocated, Preempted, Reserved}

[~epayne]/[~snemeth] could you help review? Thanks!

> Expose metrics for custom resource types in QueueMetrics
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-8842-branch-2.001.patch, YARN-8842.001.patch, 
> YARN-8842.002.patch, YARN-8842.003.patch, YARN-8842.004.patch, 
> YARN-8842.005.patch, YARN-8842.006.patch, YARN-8842.007.patch, 
> YARN-8842.008.patch, YARN-8842.009.patch, YARN-8842.010.patch, 
> YARN-8842.011.patch, YARN-8842.012.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8842) Expose metrics for custom resource types in QueueMetrics

2019-11-14 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8842:

Attachment: YARN-8842-branch-2.001.patch

> Expose metrics for custom resource types in QueueMetrics
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-8842-branch-2.001.patch, YARN-8842.001.patch, 
> YARN-8842.002.patch, YARN-8842.003.patch, YARN-8842.004.patch, 
> YARN-8842.005.patch, YARN-8842.006.patch, YARN-8842.007.patch, 
> YARN-8842.008.patch, YARN-8842.009.patch, YARN-8842.010.patch, 
> YARN-8842.011.patch, YARN-8842.012.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8842) Expose metrics for custom resource types in QueueMetrics

2019-11-14 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung reopened YARN-8842:
-

> Expose metrics for custom resource types in QueueMetrics
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-8842-branch-2.001.patch, YARN-8842.001.patch, 
> YARN-8842.002.patch, YARN-8842.003.patch, YARN-8842.004.patch, 
> YARN-8842.005.patch, YARN-8842.006.patch, YARN-8842.007.patch, 
> YARN-8842.008.patch, YARN-8842.009.patch, YARN-8842.010.patch, 
> YARN-8842.011.patch, YARN-8842.012.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974730#comment-16974730
 ] 

Hadoop QA commented on YARN-9562:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
4s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 22m 
34s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 51s{color} | {color:orange} root: The patch generated 23 new + 688 unchanged 
- 1 fixed = 711 total (was 689) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
5s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
26s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 23m  
8s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} 

[jira] [Comment Edited] (YARN-8842) Expose metrics for custom resource types in QueueMetrics

2019-11-14 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974701#comment-16974701
 ] 

Jonathan Hung edited comment on YARN-8842 at 11/15/19 12:28 AM:


[~epayne] regarding the tangled web of dependencies...I've ported YARN-7739, 
YARN-7541, YARN-8202 to branch-2. Now YARN-8842 seems to apply (mostly) cleanly.

The 1.7 support is not gonna be pretty though... There's a lot of 1.8 specific 
syntax in TestQueueMetricsForCustomResources and QueueInfo which passes methods 
into refactored functions. I can try to tackle this, let me know if you've 
already started work on this.


was (Author: jhung):
[~epayne] regarding the tangled web of dependencies...I've ported YARN-7739, 
YARN-7541, YARN-8202 to branch-2. Now YARN-8842 seems to apply (mostly) cleanly.

The 1.7 support is not gonna be pretty though... There's a lot of 1.8 specific 
syntax in TestQueueMetricsForCustomResources and QueueInfo which passes methods 
into refactored functions.

> Expose metrics for custom resource types in QueueMetrics
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch, 
> YARN-8842.006.patch, YARN-8842.007.patch, YARN-8842.008.patch, 
> YARN-8842.009.patch, YARN-8842.010.patch, YARN-8842.011.patch, 
> YARN-8842.012.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8842) Expose metrics for custom resource types in QueueMetrics

2019-11-14 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974701#comment-16974701
 ] 

Jonathan Hung commented on YARN-8842:
-

[~epayne] regarding the tangled web of dependencies...I've ported YARN-7739, 
YARN-7541, YARN-8202 to branch-2. Now YARN-8842 seems to apply (mostly) cleanly.

The 1.7 support is not gonna be pretty though... There's a lot of 1.8 specific 
syntax in TestQueueMetricsForCustomResources and QueueInfo which passes methods 
into refactored functions.

> Expose metrics for custom resource types in QueueMetrics
> 
>
> Key: YARN-8842
> URL: https://issues.apache.org/jira/browse/YARN-8842
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-8842.001.patch, YARN-8842.002.patch, 
> YARN-8842.003.patch, YARN-8842.004.patch, YARN-8842.005.patch, 
> YARN-8842.006.patch, YARN-8842.007.patch, YARN-8842.008.patch, 
> YARN-8842.009.patch, YARN-8842.010.patch, YARN-8842.011.patch, 
> YARN-8842.012.patch
>
>
> This is the 2nd dependent jira of YARN-8059.
> As updating the metrics is an independent step from handling preemption, this 
> jira only deals with the queue metrics update of custom resources.
> The following metrics should be updated: 
> * allocated resources
> * available resources
> * pending resources
> * reserved resources
> * aggregate seconds preempted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8202) DefaultAMSProcessor should properly check units of requested custom resource types against minimum/maximum allocation

2019-11-14 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8202:

Fix Version/s: 2.11.0

> DefaultAMSProcessor should properly check units of requested custom resource 
> types against minimum/maximum allocation
> -
>
> Key: YARN-8202
> URL: https://issues.apache.org/jira/browse/YARN-8202
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Blocker
> Fix For: 3.1.1, 2.11.0
>
> Attachments: YARN-8202-001.patch, YARN-8202-002.patch, 
> YARN-8202-003.patch, YARN-8202-004.patch, YARN-8202-005.patch, 
> YARN-8202-006.patch, YARN-8202-007.patch, YARN-8202-008.patch, 
> YARN-8202-009.patch, YARN-8202-010.patch
>
>
>  
> When I execute a pi job with arguments: 
> {code:java}
> -Dmapreduce.map.resource.memory-mb=200 
> -Dmapreduce.map.resource.resource1=500M 1 1000{code}
> and I have one node with 5GB of resource1, I get the following exception on 
> every second and the job hangs:
> {code:java}
> 2018-04-24 08:42:03,694 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 20 on 8030, call Call#386 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 
> 172.31.119.172:58138
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested resource type=[resource1] < 0 or greater than 
> maximum allowed allocation. Requested resource= resource1: 500M>, maximum allowed allocation= resource1: 5G>, please note that maximum allowed allocation is calculated by 
> scheduler based on maximum resource of registered NodeManagers, which might 
> be less than configured maximum allocation= resource1: 9223372036854775807G>
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:258)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:249)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:230)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>         at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>         at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> {code}
> *This is because 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils#validateResourceRequest
>  does not take resource units into account.*
>  
> However, if I start a job with arguments: 
> {code:java}
> -Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=1G 
> 1 1000{code}
> and I still have 5GB of resource1 on one node then the job runs successfully.
>  
> I also tried a third job run, when I request 1GB of resource1 and I have no 
> nodes with any amount of resource1, then I restart the node with 5GBs of 
> resource1, the job ultimately completes, but just after the node with enough 
> resources registered in RM, which is the desired behaviour.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (YARN-8202) DefaultAMSProcessor should properly check units of requested custom resource types against minimum/maximum allocation

2019-11-14 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974700#comment-16974700
 ] 

Jonathan Hung commented on YARN-8202:
-

ported to branch-2.

 

> DefaultAMSProcessor should properly check units of requested custom resource 
> types against minimum/maximum allocation
> -
>
> Key: YARN-8202
> URL: https://issues.apache.org/jira/browse/YARN-8202
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Blocker
> Fix For: 3.1.1, 2.11.0
>
> Attachments: YARN-8202-001.patch, YARN-8202-002.patch, 
> YARN-8202-003.patch, YARN-8202-004.patch, YARN-8202-005.patch, 
> YARN-8202-006.patch, YARN-8202-007.patch, YARN-8202-008.patch, 
> YARN-8202-009.patch, YARN-8202-010.patch
>
>
>  
> When I execute a pi job with arguments: 
> {code:java}
> -Dmapreduce.map.resource.memory-mb=200 
> -Dmapreduce.map.resource.resource1=500M 1 1000{code}
> and I have one node with 5GB of resource1, I get the following exception on 
> every second and the job hangs:
> {code:java}
> 2018-04-24 08:42:03,694 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 20 on 8030, call Call#386 Retry#0 
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 
> 172.31.119.172:58138
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested resource type=[resource1] < 0 or greater than 
> maximum allowed allocation. Requested resource= resource1: 500M>, maximum allowed allocation= resource1: 5G>, please note that maximum allowed allocation is calculated by 
> scheduler based on maximum resource of registered NodeManagers, which might 
> be less than configured maximum allocation= resource1: 9223372036854775807G>
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:286)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:242)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:258)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:249)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:230)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>         at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>         at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> {code}
> *This is because 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils#validateResourceRequest
>  does not take resource units into account.*
>  
> However, if I start a job with arguments: 
> {code:java}
> -Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=1G 
> 1 1000{code}
> and I still have 5GB of resource1 on one node then the job runs successfully.
>  
> I also tried a third job run, when I request 1GB of resource1 and I have no 
> nodes with any amount of resource1, then I restart the node with 5GBs of 
> resource1, the job ultimately completes, but just after the node with enough 
> resources registered in RM, which is the desired behaviour.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Commented] (YARN-7541) Node updates don't update the maximum cluster capability for resources other than CPU and memory

2019-11-14 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974699#comment-16974699
 ] 

Jonathan Hung commented on YARN-7541:
-

ported to branch-2.

> Node updates don't update the maximum cluster capability for resources other 
> than CPU and memory
> 
>
> Key: YARN-7541
> URL: https://issues.apache.org/jira/browse/YARN-7541
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0-beta1, 3.1.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Fix For: 3.0.0, 3.1.0, 2.11.0
>
> Attachments: YARN-7541.001.patch, YARN-7541.002.patch, 
> YARN-7541.003.patch, YARN-7541.004.patch, YARN-7541.005.patch, 
> YARN-7541.006.patch, YARN-7541.branch-3.0.001.patch
>
>
> When I submit an MR job that asks for too much memory or CPU for the map or 
> reduce, the AM will fail because it recognizes that the request is too large. 
>  With any other resources, however, the resource requests will instead be 
> made and remain pending forever.  Looks like we forgot to update the code 
> that tracks the maximum container allocation in {{ClusterNodeTracker}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7739) DefaultAMSProcessor should properly check customized resource types against minimum/maximum allocation

2019-11-14 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7739:

Fix Version/s: 2.11.0

> DefaultAMSProcessor should properly check customized resource types against 
> minimum/maximum allocation
> --
>
> Key: YARN-7739
> URL: https://issues.apache.org/jira/browse/YARN-7739
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 3.1.0, 2.11.0
>
> Attachments: YARN-7339.002.patch, YARN-7739.001.patch
>
>
> Currently, YARN RM reject requested resource if memory or vcores are less 
> than 0 or greater than maximum allocation. We should run the check for 
> customized resource types as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7541) Node updates don't update the maximum cluster capability for resources other than CPU and memory

2019-11-14 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7541:

Fix Version/s: 2.11.0

> Node updates don't update the maximum cluster capability for resources other 
> than CPU and memory
> 
>
> Key: YARN-7541
> URL: https://issues.apache.org/jira/browse/YARN-7541
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0-beta1, 3.1.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Fix For: 3.0.0, 3.1.0, 2.11.0
>
> Attachments: YARN-7541.001.patch, YARN-7541.002.patch, 
> YARN-7541.003.patch, YARN-7541.004.patch, YARN-7541.005.patch, 
> YARN-7541.006.patch, YARN-7541.branch-3.0.001.patch
>
>
> When I submit an MR job that asks for too much memory or CPU for the map or 
> reduce, the AM will fail because it recognizes that the request is too large. 
>  With any other resources, however, the resource requests will instead be 
> made and remain pending forever.  Looks like we forgot to update the code 
> that tracks the maximum container allocation in {{ClusterNodeTracker}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7739) DefaultAMSProcessor should properly check customized resource types against minimum/maximum allocation

2019-11-14 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974698#comment-16974698
 ] 

Jonathan Hung commented on YARN-7739:
-

ported to branch-2.

> DefaultAMSProcessor should properly check customized resource types against 
> minimum/maximum allocation
> --
>
> Key: YARN-7739
> URL: https://issues.apache.org/jira/browse/YARN-7739
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 3.1.0, 2.11.0
>
> Attachments: YARN-7339.002.patch, YARN-7739.001.patch
>
>
> Currently, YARN RM reject requested resource if memory or vcores are less 
> than 0 or greater than maximum allocation. We should run the check for 
> customized resource types as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9923) Introduce HealthReporter interface and implement running Docker daemon checker

2019-11-14 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974676#comment-16974676
 ] 

Eric Yang commented on YARN-9923:
-

[~adam.antal] 

{quote}do you mean that there was no public or hadoop-public API for 
health-checking on purpose?{quote}

I don't know if It was intentionally omitted, but I admit I didn't spent much 
time thinking about this API.  Pluggable health check interface is good.  There 
is no doubt about health check interface is a good feature for other people to 
implement their own health check implementation.  I only disagree on using Java 
to check Docker is a good pattern due to missing permissions to access 
privileged operations.

{quote}One improvement I can think of is to enable to set these things on a per 
script basis (allowing multiple scripts to run paralel).{quote}

Personally, I would prefer to avoid multi-script approach.  Apache common 
logging is one of real lesson that I learn from Hadoop that having too many run 
away threads making logging expensive and hard to debug where is the failure.  
We have moved to slf4j to reduce some of that bloat.  A single script runs 
under 30 seconds with 15 minutes interval, is more preferable by most system 
administrators.  We don't want to burn too many cpu cycles by healthcheck 
scripts.  The script itself can be organized into functions to keep things 
tidy, and potentially move some of the functions to Hadoop libexec scripts to 
keep the parts hackable and tidy.

{quote}For the sake of completeness a use case: In a cluster where Dockerized 
nodes with GPU are running TF jobs and nodes may depend on the availability of 
the Docker daemon as well as the GPU device, as of now we can only be sure that 
the node is working fine, if a container allocation is started on that node. 
{quote}

If the config toggle via environment variable can work, node manager can make 
decision of which part of the health check functions to run base on node 
manager own config.  This can prevent container to be schedule on unhealthy 
node base on above use case.  I think the outcome could be a better overall 
solution.  Wouldn't you agree?

> Introduce HealthReporter interface and implement running Docker daemon checker
> --
>
> Key: YARN-9923
> URL: https://issues.apache.org/jira/browse/YARN-9923
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9923.001.patch, YARN-9923.002.patch, 
> YARN-9923.003.patch, YARN-9923.004.patch
>
>
> Currently if a NodeManager is enabled to allocate Docker containers, but the 
> specified binary (docker.binary in the container-executor.cfg) is missing the 
> container allocation fails with the following error message:
> {noformat}
> Container launch fails
> Exit code: 29
> Exception message: Launch container failed
> Shell error output: sh: : No 
> such file or directory
> Could not inspect docker network to get type /usr/bin/docker network inspect 
> host --format='{{.Driver}}'.
> Error constructing docker command, docker error code=-1, error 
> message='Unknown error'
> {noformat}
> I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" 
> to have the following options:
> - STARTUP: setting this option the NodeManager would not start if Docker 
> binaries are missing or the Docker daemon is not running (the exception is 
> considered FATAL during startup)
> - RUNTIME: would give a more detailed/user-friendly exception in 
> NodeManager's side (NM logs) if Docker binaries are missing or the daemon is 
> not working. This would also prevent further Docker container allocation as 
> long as the binaries do not exist and the docker daemon is not running.
> - NONE (default): preserving the current behaviour, throwing exception during 
> container allocation, carrying on using the default retry procedure.
> 
> A new interface called {{HealthChecker}} is introduced which is used in the 
> {{NodeHealthCheckerService}}. Currently existing implementations like 
> {{LocalDirsHandlerService}} are modified to implement this giving a clear 
> abstraction to the node's health. The {{DockerHealthChecker}} implements this 
> new interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-14 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9562:
--
Attachment: YARN-9562.015.patch

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, 
> YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, 
> YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch, 
> YARN-9562.015.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-14 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974660#comment-16974660
 ] 

Eric Badger commented on YARN-9562:
---

Huh...that's weird. Must've been an issue with the upload. Uploaded it again.

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, 
> YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, 
> YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch, 
> YARN-9562.015.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime

2019-11-14 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974650#comment-16974650
 ] 

Jim Brennan commented on YARN-9562:
---

[~ebadger] I don't see a patch 015...

> Add Java changes for the new RuncContainerRuntime
> -
>
> Key: YARN-9562
> URL: https://issues.apache.org/jira/browse/YARN-9562
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, 
> YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, 
> YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9959) Work around hard-coded tmp and /var/tmp bind-mounts in the container's working directory

2019-11-14 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974566#comment-16974566
 ] 

Eric Badger commented on YARN-9959:
---

[~eyang], [~ccondit], [~Jim_Brennan], [~shaneku...@gmail.com], [~billie], I 
looked into a quick fix for this so that we could support some sort of 
{{%WORK_DIR%}} variable in the mount that would get normalized to the 
container's work dir. This should be possible, but is a little more than a few 
lines of code. The patch will need to touch both the Java code and the C code. 

On the Java side, the user mount does an {{if (!srcPath.isAbsolute()}}, which 
will return false and require that the mount be mounted as read-only. The 
read-only check also makes sure that the user mount is a localized resource, 
which fails since the mount is prefixed with {{%WORK_DIR}}.

On the C side, logic will need to be added in mount-utils.c. I think the 
easiest thing to do would be to add the container's working directory to 
allowed.rw-mounts list after it is parsed from the container-executor.cfg file. 
Then we can also pre-parse the passed in mounts and replace all instances of 
{{%WORK_DIR%}} with the container's working directory path.

All that being said, I don't think we should implement this into the original 
commits for YARN-9561 and YARN-9562. I can work on that patch for this JIRA. 
Let me know if you disagree with my assessment or my proposed approach.

> Work around hard-coded tmp and /var/tmp bind-mounts in the container's 
> working directory
> 
>
> Key: YARN-9959
> URL: https://issues.apache.org/jira/browse/YARN-9959
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>
> {noformat}
> addRuncMountLocation(mounts, containerWorkDir.toString() +
> "/private_slash_tmp", "/tmp", true, true);
> addRuncMountLocation(mounts, containerWorkDir.toString() +
> "/private_var_slash_tmp", "/var/tmp", true, true);
> {noformat}
> It would be good to remove the hard-coded tmp mounts from the 
> {{RuncContainerRuntime}} in place of something general or possibly a tmpfs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9959) Work around hard-coded tmp and /var/tmp bind-mounts in the container's working directory

2019-11-14 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974566#comment-16974566
 ] 

Eric Badger edited comment on YARN-9959 at 11/14/19 7:42 PM:
-

[~eyang], [~ccondit], [~Jim_Brennan], [~shaneku...@gmail.com], [~billie], I 
looked into a quick fix for this so that we could support some sort of 
{{%WORK_DIR%}} variable in the mount that would get normalized to the 
container's work dir. This should be possible, but is a little more than a few 
lines of code. The patch will need to touch both the Java code and the C code. 

On the Java side, the user mount does an {{if (!srcPath.isAbsolute())}}, which 
will return false and require that the mount be mounted as read-only. The 
read-only check also makes sure that the user mount is a localized resource, 
which fails since the mount is prefixed with {{%WORK_DIR}}.

On the C side, logic will need to be added in mount-utils.c. I think the 
easiest thing to do would be to add the container's working directory to 
allowed.rw-mounts list after it is parsed from the container-executor.cfg file. 
Then we can also pre-parse the passed in mounts and replace all instances of 
{{%WORK_DIR%}} with the container's working directory path.

All that being said, I don't think we should implement this into the original 
commits for YARN-9561 and YARN-9562. I can work on that patch for this JIRA. 
Let me know if you disagree with my assessment or my proposed approach.


was (Author: ebadger):
[~eyang], [~ccondit], [~Jim_Brennan], [~shaneku...@gmail.com], [~billie], I 
looked into a quick fix for this so that we could support some sort of 
{{%WORK_DIR%}} variable in the mount that would get normalized to the 
container's work dir. This should be possible, but is a little more than a few 
lines of code. The patch will need to touch both the Java code and the C code. 

On the Java side, the user mount does an {{if (!srcPath.isAbsolute()}}, which 
will return false and require that the mount be mounted as read-only. The 
read-only check also makes sure that the user mount is a localized resource, 
which fails since the mount is prefixed with {{%WORK_DIR}}.

On the C side, logic will need to be added in mount-utils.c. I think the 
easiest thing to do would be to add the container's working directory to 
allowed.rw-mounts list after it is parsed from the container-executor.cfg file. 
Then we can also pre-parse the passed in mounts and replace all instances of 
{{%WORK_DIR%}} with the container's working directory path.

All that being said, I don't think we should implement this into the original 
commits for YARN-9561 and YARN-9562. I can work on that patch for this JIRA. 
Let me know if you disagree with my assessment or my proposed approach.

> Work around hard-coded tmp and /var/tmp bind-mounts in the container's 
> working directory
> 
>
> Key: YARN-9959
> URL: https://issues.apache.org/jira/browse/YARN-9959
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>
> {noformat}
> addRuncMountLocation(mounts, containerWorkDir.toString() +
> "/private_slash_tmp", "/tmp", true, true);
> addRuncMountLocation(mounts, containerWorkDir.toString() +
> "/private_var_slash_tmp", "/var/tmp", true, true);
> {noformat}
> It would be good to remove the hard-coded tmp mounts from the 
> {{RuncContainerRuntime}} in place of something general or possibly a tmpfs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9938) Validate Parent Queue for QueueMapping contains dynamic group as parent queue

2019-11-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974509#comment-16974509
 ] 

Hadoop QA commented on YARN-9938:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 4 new + 47 unchanged - 1 fixed = 51 total (was 48) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 88m  
2s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}148m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9938 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985867/YARN-9938.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5b53d318d174 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 73a386a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/25170/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25170/testReport/ |
| Max. process+thread count | 818 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Updated] (YARN-9561) Add C changes for the new RuncContainerRuntime

2019-11-14 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9561:
--
Attachment: YARN-9561.014.patch

> Add C changes for the new RuncContainerRuntime
> --
>
> Key: YARN-9561
> URL: https://issues.apache.org/jira/browse/YARN-9561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9561.001.patch, YARN-9561.002.patch, 
> YARN-9561.003.patch, YARN-9561.004.patch, YARN-9561.005.patch, 
> YARN-9561.006.patch, YARN-9561.007.patch, YARN-9561.008.patch, 
> YARN-9561.009.patch, YARN-9561.010.patch, YARN-9561.011.patch, 
> YARN-9561.012.patch, YARN-9561.013.patch, YARN-9561.014.patch
>
>
> This JIRA will be used to add the C changes to the container-executor native 
> binary that are necessary for the new RuncContainerRuntime. There should be 
> no changes to existing code paths. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime

2019-11-14 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974472#comment-16974472
 ] 

Eric Badger commented on YARN-9561:
---

Patch 014 addresses the const issue brought up by cc

> Add C changes for the new RuncContainerRuntime
> --
>
> Key: YARN-9561
> URL: https://issues.apache.org/jira/browse/YARN-9561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9561.001.patch, YARN-9561.002.patch, 
> YARN-9561.003.patch, YARN-9561.004.patch, YARN-9561.005.patch, 
> YARN-9561.006.patch, YARN-9561.007.patch, YARN-9561.008.patch, 
> YARN-9561.009.patch, YARN-9561.010.patch, YARN-9561.011.patch, 
> YARN-9561.012.patch, YARN-9561.013.patch, YARN-9561.014.patch
>
>
> This JIRA will be used to add the C changes to the container-executor native 
> binary that are necessary for the new RuncContainerRuntime. There should be 
> no changes to existing code paths. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9969) Improve yarn.scheduler.capacity.queue-mappings documentation

2019-11-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974454#comment-16974454
 ] 

Hadoop QA commented on YARN-9969:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
31m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9969 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985869/YARN-9969.001.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux ad124a743848 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 73a386a |
| maven | version: Apache Maven 3.3.9 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/25171/artifact/out/whitespace-eol.txt
 |
| Max. process+thread count | 430 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25171/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Improve yarn.scheduler.capacity.queue-mappings documentation
> 
>
> Key: YARN-9969
> URL: https://issues.apache.org/jira/browse/YARN-9969
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9969.001.patch
>
>
> As discussed in 
> https://issues.apache.org/jira/browse/YARN-9865?focusedCommentId=16971482=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16971482,
>  scope of this Jira is to improve the yarn.scheduler.capacity.queue-mappings 
> in CapacityScheduler.md.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9959) Work around hard-coded tmp and /var/tmp bind-mounts in the container's working directory

2019-11-14 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974452#comment-16974452
 ] 

Eric Badger commented on YARN-9959:
---

[~shaneku...@gmail.com] pointed out in our latest docker meeting that there is 
a correctness issue with having these hard-coded mounts. If anyone mounts onto 
/tmp or /var/tmp in their container then they will get a circular mount that 
can't be cleaned up and is overall pretty gross. You can see the logs of this 
in [this 
comment|https://issues.apache.org/jira/browse/YARN-9562?focusedCommentId=16971141=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16971141].
 For now, I have added a documentation comment noting that mounting on top of 
/tmp or /var/tmp is not supported. However, this is a limitation that we should 
not need to have. 

> Work around hard-coded tmp and /var/tmp bind-mounts in the container's 
> working directory
> 
>
> Key: YARN-9959
> URL: https://issues.apache.org/jira/browse/YARN-9959
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>
> {noformat}
> addRuncMountLocation(mounts, containerWorkDir.toString() +
> "/private_slash_tmp", "/tmp", true, true);
> addRuncMountLocation(mounts, containerWorkDir.toString() +
> "/private_var_slash_tmp", "/var/tmp", true, true);
> {noformat}
> It would be good to remove the hard-coded tmp mounts from the 
> {{RuncContainerRuntime}} in place of something general or possibly a tmpfs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9981) Fix findbugs warning in timelineservice

2019-11-14 Thread Adam Antal (Jira)
Adam Antal created YARN-9981:


 Summary: Fix findbugs warning in timelineservice
 Key: YARN-9981
 URL: https://issues.apache.org/jira/browse/YARN-9981
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineservice
Affects Versions: 3.3.0
Reporter: Adam Antal


Findbugs are complaining about this:
{noformat}
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335]
{noformat}

Let's fix it!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9923) Introduce HealthReporter interface and implement running Docker daemon checker

2019-11-14 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974423#comment-16974423
 ] 

Adam Antal edited comment on YARN-9923 at 11/14/19 4:47 PM:


Hi [~eyang],

Thanks for sharing these concerns. I can see you point you come up with, let me 
react to those.
{quote}it would be better to be written as a health checker script hence, 
proper privileges can be obtained to check if the process is alive.
{quote}
Yeah, we can move the logic to a script.
{quote}Worries about future maintenance of DockerHealthCheckerService. [...]
{quote}
So do you mean that there was no public or hadoop-public API for 
health-checking on purpose? That was the missing piece that {{HealthChecker}} 
interface was trying to fill, but I can see the reason why from this 
perspective. 
 I consider using scripts a bit less flexible, that was my main reason why I 
started to work on this. Suppose you have a complex health checker logic, and 
you have multiple scripts implementing the parts. Then you must either have one 
giant script that contains all of it and could not be supported on the long 
run, or have one that imports all the others but still does not give you the 
right flexibility.

One improvement I can think of is to enable to set these things on a per script 
basis (allowing multiple scripts to run paralel). Similarly like log 
aggregation controller, we could have a configs like this:
{noformat}
yarn.health-checker.scripts=script1,script2,script3
yarn.health-checker-script.script1.path=/path/to/script/one
yarn.health-checker-script.script1.opts=...
yarn.health-checker-script.script1.timeout-ms=1000
yarn.health-checker-script.script2.path=/path/to/script/two
...
yarn.health-checker-script.script3.path=/path/to/script/three
{noformat}
{quote}Future code may have too many timer threads that runs at different 
intervals and use more system resources than necessary.
{quote}
If you believe the above approach can make sense, then we can limit this with a 
config - a max-timer-thread option to limit these threads (warning if you have 
given too much of these health checker scripts).

bq. Could we change DockerHealthCheckerService into node manager healthcheck 
script? The toggle for enable docker check can be based on environment 
variables that pass down to node manager healthcheck script. This approach will 
set good example of how to pass config toggle to healthcheck script, while 
maintaining backward compatibility to user's healthcheck script.

I like this idea, will do that in the following patch.

What do you think about this? Please share me your thoughts.

For the sake of completeness a use case: In a cluster where Dockerized nodes 
with GPU are running TF jobs and nodes may depend on the availability of the 
Docker daemon as well as the GPU device, as of now we can only be sure that the 
node is working fine, if a container allocation is started on that node. 


was (Author: adam.antal):
Hi [~eyang],

Thanks for sharing these concerns. I can see you point you come up, let me 
react to those.
{quote}it would be better to be written as a health checker script hence, 
proper privileges can be obtained to check if the process is alive.
{quote}
Yeah, we can move the logic to a script.
{quote}Worries about future maintenance of DockerHealthCheckerService. [...]
{quote}
So do you mean that there was no public or hadoop-public API for 
health-checking on purpose? That was the missing piece that {{HealthChecker}} 
interface was trying to fill, but I can see the reason why from this 
perspective. 
 I consider using scripts a bit less flexible, that was my main reason why I 
started to work on this. Suppose you have a complex health checker logic, and 
you have multiple scripts implementing the parts. Then you must either have one 
giant script that contains all of it and could not be supported on the long 
run, or have one that imports all the others but still does not give you the 
right flexibility.

One improvement I can think of is to enable to set these things on a per script 
basis (allowing multiple scripts to run paralel). Similarly like log 
aggregation controller, we could have a configs like this:
{noformat}
yarn.health-checker.scripts=script1,script2,script3
yarn.health-checker-script.script1.path=/path/to/script/one
yarn.health-checker-script.script1.opts=...
yarn.health-checker-script.script1.timeout-ms=1000
yarn.health-checker-script.script2.path=/path/to/script/two
...
yarn.health-checker-script.script3.path=/path/to/script/three
{noformat}
{quote}Future code may have too many timer threads that runs at different 
intervals and use more system resources than necessary.
{quote}
If you believe the above approach can make sense, then we can limit this with a 
config - a max-timer-thread option to limit these threads (warning if you have 
given too much of these health checker scripts).

bq. 

[jira] [Commented] (YARN-9923) Introduce HealthReporter interface and implement running Docker daemon checker

2019-11-14 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974423#comment-16974423
 ] 

Adam Antal commented on YARN-9923:
--

Hi [~eyang],

Thanks for sharing these concerns. I can see you point you come up, let me 
react to those.
{quote}it would be better to be written as a health checker script hence, 
proper privileges can be obtained to check if the process is alive.
{quote}
Yeah, we can move the logic to a script.
{quote}Worries about future maintenance of DockerHealthCheckerService. [...]
{quote}
So do you mean that there was no public or hadoop-public API for 
health-checking on purpose? That was the missing piece that {{HealthChecker}} 
interface was trying to fill, but I can see the reason why from this 
perspective. 
 I consider using scripts a bit less flexible, that was my main reason why I 
started to work on this. Suppose you have a complex health checker logic, and 
you have multiple scripts implementing the parts. Then you must either have one 
giant script that contains all of it and could not be supported on the long 
run, or have one that imports all the others but still does not give you the 
right flexibility.

One improvement I can think of is to enable to set these things on a per script 
basis (allowing multiple scripts to run paralel). Similarly like log 
aggregation controller, we could have a configs like this:
{noformat}
yarn.health-checker.scripts=script1,script2,script3
yarn.health-checker-script.script1.path=/path/to/script/one
yarn.health-checker-script.script1.opts=...
yarn.health-checker-script.script1.timeout-ms=1000
yarn.health-checker-script.script2.path=/path/to/script/two
...
yarn.health-checker-script.script3.path=/path/to/script/three
{noformat}
{quote}Future code may have too many timer threads that runs at different 
intervals and use more system resources than necessary.
{quote}
If you believe the above approach can make sense, then we can limit this with a 
config - a max-timer-thread option to limit these threads (warning if you have 
given too much of these health checker scripts).

bq. Could we change DockerHealthCheckerService into node manager healthcheck 
script? The toggle for enable docker check can be based on environment 
variables that pass down to node manager healthcheck script. This approach will 
set good example of how to pass config toggle to healthcheck script, while 
maintaining backward compatibility to user's healthcheck script.

I like this idea, will do that in the following patch.

What do you think about this? Please share me your thoughts on this.

For the sake of completeness a use case: In a cluster where Dockerized nodes 
with GPU are running TF jobs and nodes may depend on the availability of the 
Docker daemon as well as the GPU device, as of now we can only be sure that the 
node is working fine, if a container allocation is started on that node. 

> Introduce HealthReporter interface and implement running Docker daemon checker
> --
>
> Key: YARN-9923
> URL: https://issues.apache.org/jira/browse/YARN-9923
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9923.001.patch, YARN-9923.002.patch, 
> YARN-9923.003.patch, YARN-9923.004.patch
>
>
> Currently if a NodeManager is enabled to allocate Docker containers, but the 
> specified binary (docker.binary in the container-executor.cfg) is missing the 
> container allocation fails with the following error message:
> {noformat}
> Container launch fails
> Exit code: 29
> Exception message: Launch container failed
> Shell error output: sh: : No 
> such file or directory
> Could not inspect docker network to get type /usr/bin/docker network inspect 
> host --format='{{.Driver}}'.
> Error constructing docker command, docker error code=-1, error 
> message='Unknown error'
> {noformat}
> I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" 
> to have the following options:
> - STARTUP: setting this option the NodeManager would not start if Docker 
> binaries are missing or the Docker daemon is not running (the exception is 
> considered FATAL during startup)
> - RUNTIME: would give a more detailed/user-friendly exception in 
> NodeManager's side (NM logs) if Docker binaries are missing or the daemon is 
> not working. This would also prevent further Docker container allocation as 
> long as the binaries do not exist and the docker daemon is not running.
> - NONE (default): preserving the current behaviour, throwing exception during 
> container allocation, carrying on using the default retry procedure.
> 

[jira] [Commented] (YARN-9969) Improve yarn.scheduler.capacity.queue-mappings documentation

2019-11-14 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974405#comment-16974405
 ] 

Manikandan R commented on YARN-9969:


[~snemeth]  Attached .001.patch. Can you please take a doc changes?

> Improve yarn.scheduler.capacity.queue-mappings documentation
> 
>
> Key: YARN-9969
> URL: https://issues.apache.org/jira/browse/YARN-9969
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9969.001.patch
>
>
> As discussed in 
> https://issues.apache.org/jira/browse/YARN-9865?focusedCommentId=16971482=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16971482,
>  scope of this Jira is to improve the yarn.scheduler.capacity.queue-mappings 
> in CapacityScheduler.md.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9969) Improve yarn.scheduler.capacity.queue-mappings documentation

2019-11-14 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9969:
---
Attachment: YARN-9969.001.patch

> Improve yarn.scheduler.capacity.queue-mappings documentation
> 
>
> Key: YARN-9969
> URL: https://issues.apache.org/jira/browse/YARN-9969
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9969.001.patch
>
>
> As discussed in 
> https://issues.apache.org/jira/browse/YARN-9865?focusedCommentId=16971482=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16971482,
>  scope of this Jira is to improve the yarn.scheduler.capacity.queue-mappings 
> in CapacityScheduler.md.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9923) Introduce HealthReporter interface and implement running Docker daemon checker

2019-11-14 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974392#comment-16974392
 ] 

Eric Yang commented on YARN-9923:
-

[~adam.antal], thank you for the patch.  There are a number of people sharing 
the same concerns about the current implementation.  The main issues are:

# it would be better to be written as a health checker script hence, proper 
privileges can be obtained to check if the process is alive.
# Worries about future maintenance of DockerHealthCheckerService.  It sets 
precedence of examples that writing complex system specific logic in Java, 
which would be easier to implement in scripts.
# Future code may have too many timer threads that runs at different intervals 
and use more system resources than necessary.

Could we change DockerHealthCheckerService into node manager healthcheck 
script?  The toggle for enable docker check can be based on environment 
variables that pass down to node manager healthcheck script.  This approach 
will set good example of how to pass config toggle to healthcheck script, while 
maintaining backward compatibility to user's healthcheck script.

Let us know your thoughts.  Thanks

> Introduce HealthReporter interface and implement running Docker daemon checker
> --
>
> Key: YARN-9923
> URL: https://issues.apache.org/jira/browse/YARN-9923
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, yarn
>Affects Versions: 3.2.1
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9923.001.patch, YARN-9923.002.patch, 
> YARN-9923.003.patch, YARN-9923.004.patch
>
>
> Currently if a NodeManager is enabled to allocate Docker containers, but the 
> specified binary (docker.binary in the container-executor.cfg) is missing the 
> container allocation fails with the following error message:
> {noformat}
> Container launch fails
> Exit code: 29
> Exception message: Launch container failed
> Shell error output: sh: : No 
> such file or directory
> Could not inspect docker network to get type /usr/bin/docker network inspect 
> host --format='{{.Driver}}'.
> Error constructing docker command, docker error code=-1, error 
> message='Unknown error'
> {noformat}
> I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" 
> to have the following options:
> - STARTUP: setting this option the NodeManager would not start if Docker 
> binaries are missing or the Docker daemon is not running (the exception is 
> considered FATAL during startup)
> - RUNTIME: would give a more detailed/user-friendly exception in 
> NodeManager's side (NM logs) if Docker binaries are missing or the daemon is 
> not working. This would also prevent further Docker container allocation as 
> long as the binaries do not exist and the docker daemon is not running.
> - NONE (default): preserving the current behaviour, throwing exception during 
> container allocation, carrying on using the default retry procedure.
> 
> A new interface called {{HealthChecker}} is introduced which is used in the 
> {{NodeHealthCheckerService}}. Currently existing implementations like 
> {{LocalDirsHandlerService}} are modified to implement this giving a clear 
> abstraction to the node's health. The {{DockerHealthChecker}} implements this 
> new interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9938) Validate Parent Queue for QueueMapping contains dynamic group as parent queue

2019-11-14 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9938:
---
Attachment: YARN-9938.002.patch

> Validate Parent Queue for QueueMapping contains dynamic group as parent queue
> -
>
> Key: YARN-9938
> URL: https://issues.apache.org/jira/browse/YARN-9938
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9938.001.patch, YARN-9938.002.patch
>
>
> Currently \{{UserGroupMappingPlacementRule#validateParentQueue}} validates 
> the parent queue using queue path. With dynamic group using %primary_group 
> and %secondary_group in place (Refer YARN-9841 and YARN-9865) , parent queue 
> validation should also happen for these above 2 queue mappings after 
> resolving the above wildcard pattern to corresponding groups at runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9938) Validate Parent Queue for QueueMapping contains dynamic group as parent queue

2019-11-14 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974374#comment-16974374
 ] 

Manikandan R commented on YARN-9938:


Fixed Junit failures. Attached .002.patch.

> Validate Parent Queue for QueueMapping contains dynamic group as parent queue
> -
>
> Key: YARN-9938
> URL: https://issues.apache.org/jira/browse/YARN-9938
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9938.001.patch, YARN-9938.002.patch
>
>
> Currently \{{UserGroupMappingPlacementRule#validateParentQueue}} validates 
> the parent queue using queue path. With dynamic group using %primary_group 
> and %secondary_group in place (Refer YARN-9841 and YARN-9865) , parent queue 
> validation should also happen for these above 2 queue mappings after 
> resolving the above wildcard pattern to corresponding groups at runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder

2019-11-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974349#comment-16974349
 ] 

Hadoop QA commented on YARN-9052:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 88 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m  9s{color} | {color:orange} root: The patch generated 125 new + 1845 
unchanged - 42 fixed = 1970 total (was 1887) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 32s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m  
8s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m  
8s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
51s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}225m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled
 |
|   | hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivities |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9052 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder

2019-11-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974347#comment-16974347
 ] 

Hadoop QA commented on YARN-9052:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 88 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
56s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m  2s{color} | {color:orange} root: The patch generated 125 new + 1845 
unchanged - 42 fixed = 1970 total (was 1887) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 32s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m  
9s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
10s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}225m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled
 |
|   | hadoop.yarn.server.resourcemanager.TestNodeBlacklistingOnAMFailures |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivities |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9052 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler

2019-11-14 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974285#comment-16974285
 ] 

Peter Bacsko commented on YARN-9930:


[~cane] what will be your approach? I think the best would be to re-use 
{{MaxRunningAppsEnforcer}} somehow, although that class is tied to Fair 
Scheduler (eg. it expects a Fair Scheduler instance in the constructor), so it 
has to be refactored. But at least we would avoid code duplication. I think the 
refactoring is doable.

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9980) App hangs in accepted when moved from DEFAULT_PARTITION queue to an exclusive partition queue

2019-11-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974189#comment-16974189
 ] 

Hadoop QA commented on YARN-9980:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
3s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 40s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 152 unchanged - 0 fixed = 153 total (was 152) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  3s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}112m 21s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}181m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsCustomResourceTypes
 |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAppRunnability |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingUnmanagedAM |
|   | 
hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptImplDiagnostics
 |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppCustomResourceTypes
 |
|   | 
hadoop.yarn.server.resourcemanager.logaggregationstatus.TestRMAppLogAggregationStatus
 |
|   | hadoop.yarn.server.resourcemanager.TestAppManager |
|   | 

[jira] [Updated] (YARN-9052) Replace all MockRM submit method definitions with a builder

2019-11-14 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9052:
-
Attachment: YARN-9052.004.patch

> Replace all MockRM submit method definitions with a builder
> ---
>
> Key: YARN-9052
> URL: https://issues.apache.org/jira/browse/YARN-9052
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-9052.001.patch, YARN-9052.002.patch, 
> YARN-9052.003.patch, YARN-9052.004.patch, YARN-9052.004.withlogs.patch, 
> YARN-9052.testlogs.002.patch, YARN-9052.testlogs.002.patch, 
> YARN-9052.testlogs.003.patch, YARN-9052.testlogs.patch
>
>
> MockRM has 31 definitions of submitApp, most of them having more than 
> acceptable number of parameters, ranging from 2 to even 22 parameters, which 
> makes the code completely unreadable.
> On top of unreadability, it's very hard to follow what RmApp will be produced 
> for tests as they often pass a lot of empty / null values as parameters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9635) Nodes page displayed duplicate nodes

2019-11-14 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974098#comment-16974098
 ] 

Tao Yang commented on YARN-9635:


Hi, [~jiwq]. I think the description of conf in NodeManager.md is not enough 
yet, we should add some details about this change like from which version and 
why.
[~sunilg], any thoughts about the new patch?

> Nodes page displayed duplicate nodes
> 
>
> Key: YARN-9635
> URL: https://issues.apache.org/jira/browse/YARN-9635
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.2.0
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: UI2-nodes.jpg, YARN-9635.001.patch, YARN-9635.002.patch
>
>
> Steps:
>  * shutdown nodes
>  * start nodes
> Nodes Page:
> !UI2-nodes.jpg!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9958) Remove the invalid lock in ContainerExecutor

2019-11-14 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974079#comment-16974079
 ] 

Tao Yang commented on YARN-9958:


Thanks [~jiwq] for this improvement. Patch LGTM, the related r/w lock only work 
for ContainerExecutor#pidFiles which is a concurrent hash map and no need to be 
guaranteed by additional lock.
 I will commit this a few days later if no further comments.

> Remove the invalid lock in ContainerExecutor
> 
>
> Key: YARN-9958
> URL: https://issues.apache.org/jira/browse/YARN-9958
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
>
> ContainerExecutor has ReadLock and WriteLock. These used to call get/put 
> method of ConcurrentMap. Due to the ConcurrentMap providing thread safety and 
> atomicity guarantees, so we can remove the lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9980) App hangs in accepted when moved from DEFAULT_PARTITION queue to an exclusive partition queue

2019-11-14 Thread Wang, Xinglong (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Xinglong updated YARN-9980:
-
Attachment: YARN-9980.001.patch

> App hangs in accepted when moved from DEFAULT_PARTITION queue to an exclusive 
> partition queue
> -
>
> Key: YARN-9980
> URL: https://issues.apache.org/jira/browse/YARN-9980
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wang, Xinglong
>Assignee: Wang, Xinglong
>Priority: Minor
> Attachments: Screen Shot 2019-11-14 at 5.11.39 PM.png, 
> YARN-9980.001.patch
>
>
> App hangs in accpeted when moved from DEFAULT_PARTITION queue to an exclusive 
> partition queue.
> queue_root
> queue_a   - default_partition
> queue_b   - exclusive partition x, default partition is x
> When an app is submitted to queue_a, with AM_LABEL_EXPRESSION unset, RM will 
> give default_partition as AM_LABEL_EXPRESSION to this app, then it gets an 
> am1 and runs. And if later, the app is moved to queue_b, and the am1 is 
> preempted/killed/failed, it will schedule another am2 if am retry number 
> allows. But this time the resource request for this am2 is with 
> AM_LABEL_EXPRESSION = default_partition, the issue is queue_b don't have any 
> resource with default_partition, then this app will be in accepted state 
> forever in RM UI.
> My understanding is that, since the app was submitted with no 
> AM_LABEL_EXPRESSION, And in the code base, we allow in our code for such kind 
> of app to run with current queue's default partition.
> Here for the move queue scenario, we should also let the app to run 
> successfully. That means am2 should get queue_b's default partition x 
> resource to run instead of pending forever.
> In our production, we have a landing queue with default_partition, we have 
> some kind of route mechanism to route apps in this queue to other queues 
> including queues with exclusive partition.
>  !Screen Shot 2019-11-14 at 5.11.39 PM.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9980) App hangs in accepted when moved from DEFAULT_PARTITION queue to an exclusive partition queue

2019-11-14 Thread Wang, Xinglong (Jira)
Wang, Xinglong created YARN-9980:


 Summary: App hangs in accepted when moved from DEFAULT_PARTITION 
queue to an exclusive partition queue
 Key: YARN-9980
 URL: https://issues.apache.org/jira/browse/YARN-9980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wang, Xinglong
Assignee: Wang, Xinglong
 Attachments: Screen Shot 2019-11-14 at 5.11.39 PM.png

App hangs in accpeted when moved from DEFAULT_PARTITION queue to an exclusive 
partition queue.

queue_root
queue_a   - default_partition
queue_b   - exclusive partition x, default partition is x

When an app is submitted to queue_a, with AM_LABEL_EXPRESSION unset, RM will 
give default_partition as AM_LABEL_EXPRESSION to this app, then it gets an am1 
and runs. And if later, the app is moved to queue_b, and the am1 is 
preempted/killed/failed, it will schedule another am2 if am retry number 
allows. But this time the resource request for this am2 is with 
AM_LABEL_EXPRESSION = default_partition, the issue is queue_b don't have any 
resource with default_partition, then this app will be in accepted state 
forever in RM UI.

My understanding is that, since the app was submitted with no 
AM_LABEL_EXPRESSION, And in the code base, we allow in our code for such kind 
of app to run with current queue's default partition.
Here for the move queue scenario, we should also let the app to run 
successfully. That means am2 should get queue_b's default partition x resource 
to run instead of pending forever.

In our production, we have a landing queue with default_partition, we have some 
kind of route mechanism to route apps in this queue to other queues including 
queues with exclusive partition.

 !Screen Shot 2019-11-14 at 5.11.39 PM.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9965) Fix NodeManager failing to start on subsequent times when Hdfs Auxillary Jar is set

2019-11-14 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974033#comment-16974033
 ] 

Abhishek Modi commented on YARN-9965:
-

Thanks [~prabhujoseph] - I will review it by today.

> Fix NodeManager failing to start on subsequent times when Hdfs Auxillary Jar 
> is set
> ---
>
> Key: YARN-9965
> URL: https://issues.apache.org/jira/browse/YARN-9965
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: auxservices, nodemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9965-001.patch, YARN-9965-addendum-01.patch
>
>
> Loading an auxiliary jar from a Hdfs location on a node manager works as 
> expected on first time. The subsequent restart fails with 
> ClassNotFoundException
> {code:java}
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> classpath: []
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> system classes: [java., javax.accessibility., javax.activation., 
> javax.activity., javax.annotation., javax.annotation.processing., 
> javax.crypto., javax.imageio., javax.jws., javax.lang.model., 
> -javax.management.j2ee., javax.management., javax.naming., javax.net., 
> javax.print., javax.rmi., javax.script., -javax.security.auth.message., 
> javax.security.auth., javax.security.cert., javax.security.sasl., 
> javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., 
> -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., 
> org.xml.sax., org.apache.commons.logging., org.apache.log4j., 
> -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, 
> hdfs-default.xml, mapred-default.xml, yarn-default.xml]
> 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
> in state INITED
> java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016)
> {code}
>  
> The issue happens when reusing the previous localized auxillary service jar. 
> The localized jar file is appended with /* when reusing which has caused the 
> issue.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA

2019-11-14 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974020#comment-16974020
 ] 

zhoukang edited comment on YARN-9605 at 11/14/19 8:33 AM:
--

Sorry for bother [~tangzhankun][~prabhujoseph] but i really can not figure out  
the cause of warning below during 'cc phase':
I think the patch i post has no relation with hdfs?

{code:java}
WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/IpcConnectionContext.pb.cc:129:13:
 warning: 'dynamic_init_dummy_IpcConnectionContext_2eproto' defined but not 
used [-Wunused-variable]
[WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/HAServiceProtocol.pb.cc:404:13:
 warning: 'dynamic_init_dummy_HAServiceProtocol_2eproto' defined but not used 
[-Wunused-variable]
[WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/Security.pb.cc:349:13:
 warning: 'dynamic_init_dummy_Security_2eproto' defined but not used 
[-Wunused-variable]
[WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/acl.pb.cc:533:13:
 warning: 'dynamic_init_dummy_acl_2eproto' defined but not used 
[-Wunused-variable]
{code}



was (Author: cane):
Sorry for bother [~tangzhankun][~prabhujoseph] but i really can not figure out  
the cause of warning below during 'cc phase':

{code:java}
WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/IpcConnectionContext.pb.cc:129:13:
 warning: 'dynamic_init_dummy_IpcConnectionContext_2eproto' defined but not 
used [-Wunused-variable]
[WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/HAServiceProtocol.pb.cc:404:13:
 warning: 'dynamic_init_dummy_HAServiceProtocol_2eproto' defined but not used 
[-Wunused-variable]
[WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/Security.pb.cc:349:13:
 warning: 'dynamic_init_dummy_Security_2eproto' defined but not used 
[-Wunused-variable]
[WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/acl.pb.cc:533:13:
 warning: 'dynamic_init_dummy_acl_2eproto' defined but not used 
[-Wunused-variable]
{code}


> Add ZkConfiguredFailoverProxyProvider for RM HA
> ---
>
> Key: YARN-9605
> URL: https://issues.apache.org/jira/browse/YARN-9605
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-9605.001.patch, YARN-9605.002.patch, 
> YARN-9605.003.patch, YARN-9605.004.patch, YARN-9605.005.patch, 
> YARN-9605.006.patch
>
>
> In this issue, i will track a new feature to support 
> ZkConfiguredFailoverProxyProvider for RM HA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA

2019-11-14 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974020#comment-16974020
 ] 

zhoukang commented on YARN-9605:


Sorry for bother [~tangzhankun][~prabhujoseph] but i really can not figure out  
the cause of warning below during 'cc phase':

{code:java}
WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/IpcConnectionContext.pb.cc:129:13:
 warning: 'dynamic_init_dummy_IpcConnectionContext_2eproto' defined but not 
used [-Wunused-variable]
[WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/HAServiceProtocol.pb.cc:404:13:
 warning: 'dynamic_init_dummy_HAServiceProtocol_2eproto' defined but not used 
[-Wunused-variable]
[WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/Security.pb.cc:349:13:
 warning: 'dynamic_init_dummy_Security_2eproto' defined but not used 
[-Wunused-variable]
[WARNING] 
/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/acl.pb.cc:533:13:
 warning: 'dynamic_init_dummy_acl_2eproto' defined but not used 
[-Wunused-variable]
{code}


> Add ZkConfiguredFailoverProxyProvider for RM HA
> ---
>
> Key: YARN-9605
> URL: https://issues.apache.org/jira/browse/YARN-9605
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-9605.001.patch, YARN-9605.002.patch, 
> YARN-9605.003.patch, YARN-9605.004.patch, YARN-9605.005.patch, 
> YARN-9605.006.patch
>
>
> In this issue, i will track a new feature to support 
> ZkConfiguredFailoverProxyProvider for RM HA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA

2019-11-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974014#comment-16974014
 ] 

Hadoop QA commented on YARN-9605:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m  
1s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 
28s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 19m 28s{color} | 
{color:red} root generated 4 new + 22 unchanged - 4 fixed = 26 total (was 26) 
{color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 19m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 51s{color} | {color:orange} root: The patch generated 4 new + 22 unchanged - 
0 fixed = 26 total (was 22) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
1s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
55s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
18s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m 
40s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}233m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | YARN-9605 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12985805/YARN-9605.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  

[jira] [Updated] (YARN-9912) Capacity scheduler: support u:user2:%secondary_group queue mapping

2019-11-14 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9912:
---
Summary: Capacity scheduler: support u:user2:%secondary_group queue mapping 
 (was: Support u:user2:%secondary_group queue mapping)

> Capacity scheduler: support u:user2:%secondary_group queue mapping
> --
>
> Key: YARN-9912
> URL: https://issues.apache.org/jira/browse/YARN-9912
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9912.001.patch, YARN-9912.002.patch, 
> YARN-9912.003.patch
>
>
> Similar to u:user2:%primary_group mapping, add support for 
> u:user2:%secondary_group queue mapping as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9912) Capacity scheduler: support u:user2:%secondary_group queue mapping

2019-11-14 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9912:
---
Component/s: capacityscheduler
 capacity scheduler

> Capacity scheduler: support u:user2:%secondary_group queue mapping
> --
>
> Key: YARN-9912
> URL: https://issues.apache.org/jira/browse/YARN-9912
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, capacityscheduler
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9912.001.patch, YARN-9912.002.patch, 
> YARN-9912.003.patch
>
>
> Similar to u:user2:%primary_group mapping, add support for 
> u:user2:%secondary_group queue mapping as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org