[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876662#comment-16876662
 ] 

Hadoop QA commented on YARN-9655:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-3.0 Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  8m 
45s{color} | {color:red} root in branch-3.0 failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} branch-3.0 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} branch-3.0 passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m 
15s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
47s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 in branch-3.0 has 2 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} branch-3.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m  
9s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 
36s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 48m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:e402791 |
| JIRA Issue | YARN-9655 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973368/YARN-9655.branch-3.0.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ac5568e1b493 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.0 / 9daa45f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-YARN-Build/24339/artifact/out/branch-mvninstall-root.txt
 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/24339/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
 |
|  Test Results | 

[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread hunshenshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876635#comment-16876635
 ] 

hunshenshi commented on YARN-9655:
--

I upload patch for branch-2.9 and branch-3.0,please review.

Thanks [~cheersyang]

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9655.branch-2.9.patch, YARN-9655.branch-3.0.patch
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi updated YARN-9655:
-
Attachment: YARN-9655.branch-3.0.patch

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9655.branch-2.9.patch, YARN-9655.branch-3.0.patch
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876632#comment-16876632
 ] 

Weiwei Yang commented on YARN-9655:
---

Thanks [~hunhun], re-opened the issue to trigger jenkins job.

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9655.branch-2.9.patch
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reopened YARN-9655:
---

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9655.branch-2.9.patch
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi updated YARN-9655:
-
Attachment: YARN-9655.branch-2.9.patch

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9655.branch-2.9.patch
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread hunshenshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876626#comment-16876626
 ] 

hunshenshi commented on YARN-9655:
--

OK,I will check

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876620#comment-16876620
 ] 

Weiwei Yang commented on YARN-9655:
---

I just pushed this to trunk, cherry-picked to branch-3.2, branch-3.1. Thanks 
for the contribution [~hunhun].

FederationInterceptor was added in 2.9, does this issue also exist in 
branch-2.9 and branch-3.0? If they do, then we need to provide a patch for 
branch-2.9, branch-2 and branch-3.0. [~hunhun], please let me know, thanks.

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9655:
--
Fix Version/s: 3.1.3

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9601) Potential NPE in ZookeeperFederationStateStore#getPoliciesConfigurations

2019-07-01 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi reassigned YARN-9601:


Assignee: hunshenshi

> Potential NPE in ZookeeperFederationStateStore#getPoliciesConfigurations
> 
>
> Key: YARN-9601
> URL: https://issues.apache.org/jira/browse/YARN-9601
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> Potential NPE in ZookeeperFederationStateStore#getPoliciesConfigurations
> The code of ZookeeperFederationStateStore#getPoliciesConfigurations
> {code:java}
> for (String child : zkManager.getChildren(policiesZNode)) {
>   SubClusterPolicyConfiguration policy = getPolicy(child);
>   result.add(policy);
> }
> {code}
> The result of `getPolicy` may be null, so policy should be checked 
> The new code 
> {code:java}
> for (String child : zkManager.getChildren(policiesZNode)) {
>   SubClusterPolicyConfiguration policy = getPolicy(child);
>   // policy maybe null, should check
>   if (policy == null) {
> LOG.warn("Policy for queue: {} does not exist.", child);
> continue;
>   }
>   result.add(policy);
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9643) Federation: Add subClusterID in nodes page of Router web

2019-07-01 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi reassigned YARN-9643:


Assignee: hunshenshi

> Federation: Add subClusterID in nodes page of Router web
> 
>
> Key: YARN-9643
> URL: https://issues.apache.org/jira/browse/YARN-9643
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Attachments: nodes.png
>
>
> In nodes page of router web, there only are node info, No cluster id 
> corresponding to the node.
> [http://127.0.0.1:8089/cluster/nodes|http://192.168.169.72:8089/cluster/nodes]
> !nodes.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9655:
--
Fix Version/s: 3.2.1

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YARN-9655.
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0
>
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-07-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876596#comment-16876596
 ] 

Hudson commented on YARN-9655:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16849 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16849/])
YARN-9655. AllocateResponse in FederationInterceptor lost (wwei: rev 
570eee30e5ab5cf37b1a758934987cbf61140f6a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestFederationInterceptor.java


> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9662) Preemption not working on NodeLabels

2019-07-01 Thread Amithsha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amithsha updated YARN-9662:
---
Description: 
Preemption on node labels is not working when the utilization is 100%.

Example
 adhocp0,adhocp1,adhocp3 Queues mapped to nodelabels of label_adhoc_nm.
 With a share of 60,30,10 as actual capacity and 100 as maximum capacity for 
all.
 When a jobA on adhocp3 consumes 100% of its maximum capacity and a jobB 
submitted on adhocp0 no containers running on adhocp3  queue got preempted.
  
 This is already reported by another user
 https://issues.apache.org/jira/browse/YARN-7685

Note :

Jobs with more than actual capacity and less than the maximum capacity are able 
to preempt the containers.

  was:
Preemption on node labels is not working at 100% utilisation.

Example
adhocp0,adhocp1,adhocp3 mapped to nodelabels of label_adhoc_nm.
With a share of 60,30,10 as actual capacity and 100 as maximum capacity for all.
When a jobA on adhocp3 consumes 100% of its maximum capacity and a jobB 
submitted on adhocp0 no containers running on adhocp3 got preempted.
 
This is already reported by another user
https://issues.apache.org/jira/browse/YARN-7685


> Preemption not working on NodeLabels
> 
>
> Key: YARN-9662
> URL: https://issues.apache.org/jira/browse/YARN-9662
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Amithsha
>Priority: Major
>
> Preemption on node labels is not working when the utilization is 100%.
> Example
>  adhocp0,adhocp1,adhocp3 Queues mapped to nodelabels of label_adhoc_nm.
>  With a share of 60,30,10 as actual capacity and 100 as maximum capacity for 
> all.
>  When a jobA on adhocp3 consumes 100% of its maximum capacity and a jobB 
> submitted on adhocp0 no containers running on adhocp3  queue got preempted.
>   
>  This is already reported by another user
>  https://issues.apache.org/jira/browse/YARN-7685
> Note :
> Jobs with more than actual capacity and less than the maximum capacity are 
> able to preempt the containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9662) Preemption not working on NodeLabels

2019-07-01 Thread Amithsha (JIRA)
Amithsha created YARN-9662:
--

 Summary: Preemption not working on NodeLabels
 Key: YARN-9662
 URL: https://issues.apache.org/jira/browse/YARN-9662
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Amithsha


Preemption on node labels is not working at 100% utilisation.

Example
adhocp0,adhocp1,adhocp3 mapped to nodelabels of label_adhoc_nm.
With a share of 60,30,10 as actual capacity and 100 as maximum capacity for all.
When a jobA on adhocp3 consumes 100% of its maximum capacity and a jobB 
submitted on adhocp0 no containers running on adhocp3 got preempted.
 
This is already reported by another user
https://issues.apache.org/jira/browse/YARN-7685



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9473) [Umbrella] Support Vector Engine ( a new accelerator hardware) based on pluggable device framework

2019-07-01 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YARN-9473.

   Resolution: Fixed
Fix Version/s: 3.3.0

> [Umbrella] Support Vector Engine ( a new accelerator hardware) based on 
> pluggable device framework
> --
>
> Key: YARN-9473
> URL: https://issues.apache.org/jira/browse/YARN-9473
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Zhankun Tang
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0
>
>
> As the heterogeneous computation trend rises, new acceleration hardware like 
> GPU, FPGA is used to satisfy various requirements.
> And a new hardware Vector Engine (VE) which released by NEC company is 
> another example. The VE is like GPU but has different characteristics. It's 
> suitable for machine learning and HPC due to better memory bandwidth and no 
> PCIe bottleneck.
> Please Check here for more VE details:
> [https://www.nextplatform.com/2017/11/22/deep-dive-necs-aurora-vector-engine/]
> [https://www.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf]
> As we know, YARN-8851 is a pluggable device framework which provides an easy 
> way to develop a plugin for such new accelerators. This JIRA proposes to 
> develop a new VE plugin based on that framework and be implemented as current 
> GPU's "NvidiaGPUPluginForRuntimeV2" plugin.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9473) [Umbrella] Support Vector Engine ( a new accelerator hardware) based on pluggable device framework

2019-07-01 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876454#comment-16876454
 ] 

Peter Bacsko commented on YARN-9473:


Subtasks have been committed to trunk - closing this ticket.

> [Umbrella] Support Vector Engine ( a new accelerator hardware) based on 
> pluggable device framework
> --
>
> Key: YARN-9473
> URL: https://issues.apache.org/jira/browse/YARN-9473
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Zhankun Tang
>Assignee: Peter Bacsko
>Priority: Major
>
> As the heterogeneous computation trend rises, new acceleration hardware like 
> GPU, FPGA is used to satisfy various requirements.
> And a new hardware Vector Engine (VE) which released by NEC company is 
> another example. The VE is like GPU but has different characteristics. It's 
> suitable for machine learning and HPC due to better memory bandwidth and no 
> PCIe bottleneck.
> Please Check here for more VE details:
> [https://www.nextplatform.com/2017/11/22/deep-dive-necs-aurora-vector-engine/]
> [https://www.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf]
> As we know, YARN-8851 is a pluggable device framework which provides an easy 
> way to develop a plugin for such new accelerators. This JIRA proposes to 
> develop a new VE plugin based on that framework and be implemented as current 
> GPU's "NvidiaGPUPluginForRuntimeV2" plugin.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-01 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko reassigned YARN-9660:
--

Assignee: Peter Bacsko

> Enhance documentation of Docker on YARN support
> ---
>
> Key: YARN-9660
> URL: https://issues.apache.org/jira/browse/YARN-9660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Right now, using Docker on YARN has some hard requirements. If these 
> requirements are not met, then launching the containers will fail and and 
> error message will be printed. Depending on how familiar the user is with 
> Docker, it might or might not be easy for them to understand what went wrong 
> and how to fix the underlying problem.
> It would be important to explicitly document these requirements along with 
> the error messages.
> *#1: CGroups handler cannot be systemd*
> If docker deamon runs with systemd cgroups handler, we receive the following 
> error upon launching a container:
> {noformat}
> Container id: container_1561638268473_0006_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: 
> cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
> Solution: switch to cgroupfs. Doing so can be OS-specific, but we can 
> document a {{systemcl}} example.
>  
> *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*
> Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
> It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
> there's only {{/bin/sh}}.
> If we try to use these kind of images, we'll see the following error message:
> {noformat}
> Container id: container_1561638268473_0015_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
> runtime error: container_linux.go:235: starting container process caused 
> "exec: \"bash\": executable file not found in $PATH".
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
>  
> *#3: {{find}} command must be available on the {{$PATH}}*
> It seems obvious that we have the {{find}} command, but even very popular 
> images like {{fedora}} requires that we install it separately.
> If we don't have {{find}} available, then {{launcher_container.sh}} fails 
> with:
> {noformat}
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> {noformat}
> *#4 Add cmd-line example of how to tag local images*
> This is actually documented under "Privileged Container Security 
> Consideration", but an one-liner would be helpful. I had trouble running a 
> local docker image and tagging it appropriately. Just an example like 
> {{docker tag local_ubuntu local/ubuntu:latest}} is already very informative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9661) Fix typos in LocalityMulticastAMRMProxyPolicy and AbstractConfigurableFederationPolicy

2019-07-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876403#comment-16876403
 ] 

Hudson commented on YARN-9661:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16844 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16844/])
YARN-9661:Fix typo in LocalityMulticastAMRMProxyPolicy.java and (elgoiri: rev 
b1dafc3506de4bb827138493d5cc25da704f5609)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/AbstractConfigurableFederationPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/amrmproxy/LocalityMulticastAMRMProxyPolicy.java


> Fix typos in LocalityMulticastAMRMProxyPolicy and 
> AbstractConfigurableFederationPolicy
> --
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0
>
>
> There are some typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-01 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876401#comment-16876401
 ] 

Eric Yang commented on YARN-9660:
-

1.  +1 for systemctl document addition.  +1 for producing more user friendly 
error message from container-executor in a separate ticket.
 
2 and 3.  bash and find are required when running docker without ENTRYPOINT 
support.  I think this issue can be resolved if 
YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE is default to true.  
However, I also understand the current default is for making Docker container 
to be more like Yarn container that allow existing BigData workload to run 
without modification.  The documentation can probably explain the reason that 
bash and find are required for Yarn container, but optional when entrypoint 
mode is activated.

4.  +1 for the current proposal.

> Enhance documentation of Docker on YARN support
> ---
>
> Key: YARN-9660
> URL: https://issues.apache.org/jira/browse/YARN-9660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Priority: Major
>
> Right now, using Docker on YARN has some hard requirements. If these 
> requirements are not met, then launching the containers will fail and and 
> error message will be printed. Depending on how familiar the user is with 
> Docker, it might or might not be easy for them to understand what went wrong 
> and how to fix the underlying problem.
> It would be important to explicitly document these requirements along with 
> the error messages.
> *#1: CGroups handler cannot be systemd*
> If docker deamon runs with systemd cgroups handler, we receive the following 
> error upon launching a container:
> {noformat}
> Container id: container_1561638268473_0006_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: 
> cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
> Solution: switch to cgroupfs. Doing so can be OS-specific, but we can 
> document a {{systemcl}} example.
>  
> *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*
> Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
> It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
> there's only {{/bin/sh}}.
> If we try to use these kind of images, we'll see the following error message:
> {noformat}
> Container id: container_1561638268473_0015_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
> runtime error: container_linux.go:235: starting container process caused 
> "exec: \"bash\": executable file not found in $PATH".
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
>  
> *#3: {{find}} command must be available on the {{$PATH}}*
> It seems obvious that we have the {{find}} command, but even very popular 
> images like {{fedora}} requires that we install it separately.
> If we don't have {{find}} available, then {{launcher_container.sh}} fails 
> with:
> {noformat}
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> {noformat}
> *#4 Add cmd-line example of how to tag local images*
> This is actually documented under "Privileged Container Security 
> Consideration", but an one-liner would be helpful. I had trouble running a 
> local docker image and tagging it appropriately. Just an example like 
> {{docker tag local_ubuntu local/ubuntu:latest}} is already very informative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9661) Fix typos in LocalityMulticastAMRMProxyPolicy and AbstractConfigurableFederationPolicy

2019-07-01 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876384#comment-16876384
 ] 

Íñigo Goiri commented on YARN-9661:
---

Merged the PR.

> Fix typos in LocalityMulticastAMRMProxyPolicy and 
> AbstractConfigurableFederationPolicy
> --
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0
>
>
> There are some typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9661) Fix typos in LocalityMulticastAMRMProxyPolicy and AbstractConfigurableFederationPolicy

2019-07-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved YARN-9661.
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0

> Fix typos in LocalityMulticastAMRMProxyPolicy and 
> AbstractConfigurableFederationPolicy
> --
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
> Fix For: 3.3.0
>
>
> There are some typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9661) Fix typos in LocalityMulticastAMRMProxyPolicy and AbstractConfigurableFederationPolicy

2019-07-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-9661:
--
Summary: Fix typos in LocalityMulticastAMRMProxyPolicy and 
AbstractConfigurableFederationPolicy  (was: Fix typos in 
LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy)

> Fix typos in LocalityMulticastAMRMProxyPolicy and 
> AbstractConfigurableFederationPolicy
> --
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> There are some typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9661) Fix typos in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy

2019-07-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-9661:
--
Summary: Fix typos in LocalityMulticastAMRMProxyPolicy.java and 
AbstractConfigurableFederationPolicy  (was: Fix typo in 
LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy)

> Fix typos in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy
> ---
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> There are some typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy

2019-07-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-9661:
--
Summary: Fix typo in LocalityMulticastAMRMProxyPolicy.java and 
AbstractConfigurableFederationPolicy  (was: Fix typo in 
LocalityMulticastAMRMProxyPolicy.java and 
AbstractConfigurableFederationPolicy.java)

> Fix typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy
> --
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> There are some typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9480) createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl

2019-07-01 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876210#comment-16876210
 ] 

Zhankun Tang commented on YARN-9480:


[~yoelee], added [~Yunyao Zhang]. Thanks [~Weiwei Yang] !

> createAppDir() in LogAggregationService shouldn't block dispatcher thread of 
> ContainerManagerImpl
> -
>
> Key: YARN-9480
> URL: https://issues.apache.org/jira/browse/YARN-9480
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: liyakun
>Assignee: liyakun
>Priority: Major
>
> At present, when startContainers(), if NM does not contain the application, 
> it will enter the step of INIT_APPLICATION. In the application init step, 
> createAppDir() will be executed, and it is a blocking operation.
> createAppDir() is an operation that needs to interact with an external file 
> system. This operation is affected by the SLA of the external file system. 
> Once the external file system has a high latency, the NM dispatcher thread of 
> ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM 
> stuck here for more than an hour.)
> I think it would be more reasonable to move createAppDir() to the actual time 
> of uploading log (in other threads). And according to the logRetentionPolicy, 
> many of the containers may not get to this step, which will save a lot of 
> interactions with external file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java

2019-07-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876205#comment-16876205
 ] 

Hadoop QA commented on YARN-9661:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m  
8s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
36s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1042/1/artifact/out/Dockerfile
 |
| GITHUB PR | https://github.com/apache/hadoop/pull/1042 |
| JIRA Issue | YARN-9661 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 222a3b80dc9c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 1e727cf |
| Default Java | 1.8.0_212 |
|  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1042/1/testReport/ |
| Max. process+thread count | 412 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-07-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876199#comment-16876199
 ] 

Hadoop QA commented on YARN-9629:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 382 unchanged - 1 fixed = 382 total (was 383) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
57s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
48s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m  
0s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9629 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973335/YARN-9629.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux fbbac63a0bcb 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 

[jira] [Commented] (YARN-9250) hadoop-yarn-server-nodemanager build failed: make failed with error code 2

2019-07-01 Thread hunshenshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876185#comment-16876185
 ] 

hunshenshi commented on YARN-9250:
--

[~linlong] you can use -X see more error info, maybe it will help you.

> hadoop-yarn-server-nodemanager build failed: make failed with error code 2
> --
>
> Key: YARN-9250
> URL: https://issues.apache.org/jira/browse/YARN-9250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: charlie mao
>Priority: Blocker
>
> when i compile hadoop-3.2.0 release,i encountered the following errors:
> [ERROR] Failed to execute goal 
> org.apache.hadoop:hadoop-maven-plugins:3.2.0:cmake-compile (cmake-compile) on 
> project hadoop-yarn-server-nodemanager: make failed with error code 2 -> 
> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.hadoop:hadoop-maven-plugins:3.2.0:cmake-compile 
> (cmake-compile) on project hadoop-yarn-server-nodemanager: make failed with 
> error code 2
>  at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
>  at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>  at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>  at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>  at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
>  at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
>  at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
>  at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
>  at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
>  at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
>  at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
>  at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
>  at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
>  at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
>  at 
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
>  at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
> Caused by: org.apache.maven.plugin.MojoExecutionException: make failed with 
> error code 2
>  at 
> org.apache.hadoop.maven.plugin.cmakebuilder.CompileMojo.runMake(CompileMojo.java:231)
>  at 
> org.apache.hadoop.maven.plugin.cmakebuilder.CompileMojo.execute(CompileMojo.java:98)
>  at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
>  at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
>  ... 20 more
> [ERROR]
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR] mvn  -rf :hadoop-yarn-server-nodemanager
>  
> my compiling environment:
> jdk 1.8.0_181
> maven:3.3.9(/3.6.0)
> cmake version 3.12.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java

2019-07-01 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi reassigned YARN-9661:


Assignee: hunshenshi

> Fix typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java
> ---
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Assignee: hunshenshi
>Priority: Major
>
> There are some typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java

2019-07-01 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi updated YARN-9661:
-
Component/s: federation

> Fix typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java
> ---
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Priority: Major
>
> There are some typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java

2019-07-01 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi updated YARN-9661:
-
Component/s: yarn

> Fix typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java
> ---
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Priority: Major
>
> There are some typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-07-01 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876171#comment-16876171
 ] 

Szilard Nemeth commented on YARN-9629:
--

Hi [~adam.antal]!
+1 for the latest patch!

> Support configurable MIN_LOG_ROLLING_INTERVAL
> -
>
> Key: YARN-9629
> URL: https://issues.apache.org/jira/browse/YARN-9629
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9629.001.patch, YARN-9629.002.patch, 
> YARN-9629.003.patch, YARN-9629.004.patch, YARN-9629.005.patch
>
>
> One of the log-aggregation parameter, the minimum valid value for 
> {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is 
> MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in 
> YARN-2583. 
> It has been empirically set as 1 hour, as lower values would too frequently 
> put the NodeManagers under pressure. For bigger clusters that is indeed a 
> valid limitation, but for smaller clusters it makes sense and a valid 
> customer usecase to use lower values, even like not so lower 30 mins. At this 
> point this can only be achieved by setting 
> {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be 
> kept as debug purposes.
> I'm suggesting to make this min configurable, although a warning should be 
> logged in the NodeManager startup when this value is lower than 1 hour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java

2019-07-01 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi updated YARN-9661:
-
Description: There are some typo in LocalityMulticastAMRMProxyPolicy.java 
and AbstractConfigurableFederationPolicy.java  (was: There are some typo in )

> Fix typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java
> ---
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Priority: Major
>
> There are some typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java

2019-07-01 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi updated YARN-9661:
-
Description: There are some typo in 

> Fix typo in LocalityMulticastAMRMProxyPolicy.java and 
> AbstractConfigurableFederationPolicy.java
> ---
>
> Key: YARN-9661
> URL: https://issues.apache.org/jira/browse/YARN-9661
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Priority: Major
>
> There are some typo in 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java

2019-07-01 Thread hunshenshi (JIRA)
hunshenshi created YARN-9661:


 Summary: Fix typo in LocalityMulticastAMRMProxyPolicy.java and 
AbstractConfigurableFederationPolicy.java
 Key: YARN-9661
 URL: https://issues.apache.org/jira/browse/YARN-9661
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: hunshenshi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-01 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876165#comment-16876165
 ] 

Szilard Nemeth commented on YARN-9660:
--

Hi [~pbacsko]!
Thanks for these improvement proposal of the documentation!
I think it's obvious that all of these points about docker image requirements 
should be documented properly with some examples on compatible images.

1. As we discussed offline, the active cgroup handler could be easily printable 
with running "docker info". I guess this is an OS-independent way to detect the 
active handler.
However, if we want to detect it, we need to run docker info before running any 
container and we would also rely on the output of docker info. 
I don't know how likely the output of docker info changes, but in the end, it's 
a dependency anyway.

2. and 3. I think we could detect this easily by creating some 
"image-validation" phase where we would check the availability of the bash 
commands that we are utilizing with the container executor script.
If we agree on having such a validation phase, point #1 also could be the part 
of the validation process.

All in all, I'm voting for updating the doc and having as many validations as 
possible, as it makes the Docker feature more easy and straightforward to use.

> Enhance documentation of Docker on YARN support
> ---
>
> Key: YARN-9660
> URL: https://issues.apache.org/jira/browse/YARN-9660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Priority: Major
>
> Right now, using Docker on YARN has some hard requirements. If these 
> requirements are not met, then launching the containers will fail and and 
> error message will be printed. Depending on how familiar the user is with 
> Docker, it might or might not be easy for them to understand what went wrong 
> and how to fix the underlying problem.
> It would be important to explicitly document these requirements along with 
> the error messages.
> *#1: CGroups handler cannot be systemd*
> If docker deamon runs with systemd cgroups handler, we receive the following 
> error upon launching a container:
> {noformat}
> Container id: container_1561638268473_0006_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: 
> cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
> Solution: switch to cgroupfs. Doing so can be OS-specific, but we can 
> document a {{systemcl}} example.
>  
> *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*
> Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
> It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
> there's only {{/bin/sh}}.
> If we try to use these kind of images, we'll see the following error message:
> {noformat}
> Container id: container_1561638268473_0015_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
> runtime error: container_linux.go:235: starting container process caused 
> "exec: \"bash\": executable file not found in $PATH".
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
>  
> *#3: {{find}} command must be available on the {{$PATH}}*
> It seems obvious that we have the {{find}} command, but even very popular 
> images like {{fedora}} requires that we install it separately.
> If we don't have {{find}} available, then {{launcher_container.sh}} fails 
> with:
> {noformat}
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> {noformat}
> *#4 Add cmd-line example of how to tag local images*
> This is actually documented under "Privileged Container Security 
> Consideration", but an one-liner would be helpful. I had trouble running a 
> local docker image 

[jira] [Updated] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-01 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9660:
---
Description: 
Right now, using Docker on YARN has some hard requirements. If these 
requirements are not met, then launching the containers will fail and and error 
message will be printed. Depending on how familiar the user is with Docker, it 
might or might not be easy for them to understand what went wrong and how to 
fix the underlying problem.

It would be important to explicitly document these requirements along with the 
error messages.

*#1: CGroups handler cannot be systemd*

If docker deamon runs with systemd cgroups handler, we receive the following 
error upon launching a container:
{noformat}
Container id: container_1561638268473_0006_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: 
cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
See '/usr/bin/docker-current run --help'.
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}
Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document 
a {{systemcl}} example.

 

*#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*

Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
there's only {{/bin/sh}}.

If we try to use these kind of images, we'll see the following error message:
{noformat}
Container id: container_1561638268473_0015_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
runtime error: container_linux.go:235: starting container process caused "exec: 
\"bash\": executable file not found in $PATH".
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}
 

*#3: {{find}} command must be available on the {{$PATH}}*

It seems obvious that we have the {{find}} command, but even very popular 
images like {{fedora}} requires that we install it separately.

If we don't have {{find}} available, then {{launcher_container.sh}} fails with:
{noformat}
[2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
 line 44: find: command not found
Last 4096 bytes of stderr.txt :
[2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
 line 44: find: command not found
Last 4096 bytes of stderr.txt :
{noformat}

*#4 Add cmd-line example of how to tag local images*

This is actually documented under "Privileged Container Security 
Consideration", but an one-liner would be helpful. I had trouble running a 
local docker image and tagging it appropriately. Just an example like {{docker 
tag local_ubuntu local/ubuntu:latest}} is already very informative.

  was:
Right now, using Docker on YARN has some hard requirements. If these 
requirements are not met, then launching the containers will fail and and error 
message will be printed. Depending on how familiar the user is with Docker, it 
might or might not be easy for them to understand what went wrong and how to 
fix the underlying problem.

It would be important to explicitly document these requirements along with the 
error messages.

*#1: CGroups handler cannot be systemd*

If docker deamon runs with systemd cgroups handler, we receive the following 
error upon launching a container:
{noformat}
Container id: container_1561638268473_0006_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: 
cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
See '/usr/bin/docker-current run --help'.
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}
Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document 
a {{systemcl}} example.

 

*#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*

Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
there's only {{/bin/sh}}.

If we try to use these kind of images, we'll see the following error message:
{noformat}
Container id: 

[jira] [Updated] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-01 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9660:
---
Description: 
Right now, using Docker on YARN has some hard requirements. If these 
requirements are not met, then launching the containers will fail and and error 
message will be printed. Depending on how familiar the user is with Docker, it 
might or might not be easy for them to understand what went wrong and how to 
fix the underlying problem.

It would be important to explicitly document these requirements along with the 
error messages.

*#1: CGroups handler cannot be systemd*

If docker deamon runs with systemd cgroups handler, we receive the following 
error upon launching a container:
{noformat}
Container id: container_1561638268473_0006_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: 
cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
See '/usr/bin/docker-current run --help'.
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}
Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document 
a {{systemcl}} example.

 

*#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*

Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
there's only {{/bin/sh}}.

If we try to use these kind of images, we'll see the following error message:
{noformat}
Container id: container_1561638268473_0015_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
runtime error: container_linux.go:235: starting container process caused "exec: 
\"bash\": executable file not found in $PATH".
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}
 

*#3: {{find}} command must be available on the {{$PATH}}*

It seems obvious that we have the {{find}} command, but even very popular 
images like {{fedora}} requires that we install it separately.

If we don't have {{find}} available, then {{launcher_container.sh}} fails with:
{noformat}
[2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
 line 44: find: command not found
Last 4096 bytes of stderr.txt :
[2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
 line 44: find: command not found
Last 4096 bytes of stderr.txt :
{noformat}

#4 Add cmd-line example of how to tag local images

This is actually documented under "Privileged Container Security 
Consideration", but an one-liner would be helpful. I had trouble running a 
local docker image and tagging it appropriately. Just an example like {{docker 
tag local_ubuntu local/ubuntu:latest}} is already very informative.

  was:
Right now, using Docker on YARN has some hard requirements. If these 
requirements are not met, then launching the containers will fail and and error 
message will be printed. Depending on how familiar the user is with Docker, it 
might or might not be easy for them to understand what went wrong and how to 
fix the underlying problem.

It would be important to explicitly document these requirements along with the 
error messages.

*#1: CGroups handler cannot be systemd*

If docker deamon runs with systemd cgroups handler, we receive the following 
error upon launching a container:
{noformat}
Container id: container_1561638268473_0006_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: 
cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
See '/usr/bin/docker-current run --help'.
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}
Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document 
a {{systemcl}} example.

 

*#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*

Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
there's only {{/bin/sh}}.

If we try to use these kind of images, we'll see the following error message:
{noformat}
Container id: 

[jira] [Commented] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-01 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876144#comment-16876144
 ] 

Peter Bacsko commented on YARN-9660:


cc [~shaneku...@gmail.com] [~eyang] [~snemeth] - what do you guys think?

I believe some of these could be detected and even printed to the user. The 
hard-coded {{/bin/bash}} could be overridable in {{UnixShellScriptBuilder}}. We 
have options here.

> Enhance documentation of Docker on YARN support
> ---
>
> Key: YARN-9660
> URL: https://issues.apache.org/jira/browse/YARN-9660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Priority: Major
>
> Right now, using Docker on YARN has some hard requirements. If these 
> requirements are not met, then launching the containers will fail and and 
> error message will be printed. Depending on how familiar the user is with 
> Docker, it might or might not be easy for them to understand what went wrong 
> and how to fix the underlying problem.
> It would be important to explicitly document these requirements along with 
> the error messages.
> *#1: CGroups handler cannot be systemd*
> If docker deamon runs with systemd cgroups handler, we receive the following 
> error upon launching a container:
> {noformat}
> Container id: container_1561638268473_0006_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: 
> cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
> Solution: switch to cgroupfs. Doing so can be OS-specific, but we can 
> document a {{systemcl}} example.
>  
> *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*
> Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
> It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
> there's only {{/bin/sh}}.
> If we try to use these kind of images, we'll see the following error message:
> {noformat}
> Container id: container_1561638268473_0015_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
> runtime error: container_linux.go:235: starting container process caused 
> "exec: \"bash\": executable file not found in $PATH".
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
>  
> *#3: {{find}} command must be available on the {{$PATH}}*
> It seems obvious that we have the {{find}} command, but even very popular 
> images like {{fedora}} requires that we install it separately.
> If we don't have {{find}} available, then {{launcher_container.sh}} fails 
> with:
> {noformat}
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. 
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
>  line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-01 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9660:
---
Description: 
Right now, using Docker on YARN has some hard requirements. If these 
requirements are not met, then launching the containers will fail and and error 
message will be printed. Depending on how familiar the user is with Docker, it 
might or might not be easy for them to understand what went wrong and how to 
fix the underlying problem.

It would be important to explicitly document these requirements along with the 
error messages.

*#1: CGroups handler cannot be systemd*

If docker deamon runs with systemd cgroups handler, we receive the following 
error upon launching a container:
{noformat}
Container id: container_1561638268473_0006_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: 
cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
See '/usr/bin/docker-current run --help'.
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}
Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document 
a {{systemcl}} example.

 

*#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*

Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
there's only {{/bin/sh}}.

If we try to use these kind of images, we'll see the following error message:
{noformat}
Container id: container_1561638268473_0015_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
runtime error: container_linux.go:235: starting container process caused "exec: 
\"bash\": executable file not found in $PATH".
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}
 

*#3: {{find}} command must be available on the {{$PATH}}*

It seems obvious that we have the {{find}} command, but even very popular 
images like {{fedora}} requires that we install it separately.

If we don't have {{find}} available, then {{launcher_container.sh}} fails with:
{noformat}
[2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
 line 44: find: command not found
Last 4096 bytes of stderr.txt :
[2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
 line 44: find: command not found
Last 4096 bytes of stderr.txt :
{noformat}

  was:
Right now, using Docker on YARN has some hard requirements. If these 
requirements are not met, then launching the containers will fail and and error 
message will be printed. Depending on how familiar the user is with Docker, it 
might or might not be easy for them to understand what went wrong and how to 
fix the underlying problem.

It would be important to explicitly document these requirements along with the 
error messages.

#1: CGroups handler cannot be systemd

If docker deamon runs with systemd cgroups handler, we receive the following 
error upon launching a container:

{noformat}
Container id: container_1561638268473_0006_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: 
cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
See '/usr/bin/docker-current run --help'.
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}

Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document 
a {{systemcl}} example.


#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container
Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
there's only {{/bin/sh}}.

If we try to use these kind of images, we'll see the following error message:

{noformat}
Container id: container_1561638268473_0015_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
runtime error: container_linux.go:235: starting container process caused "exec: 
\"bash\": executable file not found in $PATH".
Shell output: main : command provided 4
main : run as user is johndoe
main : 

[jira] [Created] (YARN-9660) Enhance documentation of Docker on YARN support

2019-07-01 Thread Peter Bacsko (JIRA)
Peter Bacsko created YARN-9660:
--

 Summary: Enhance documentation of Docker on YARN support
 Key: YARN-9660
 URL: https://issues.apache.org/jira/browse/YARN-9660
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation, nodemanager
Reporter: Peter Bacsko


Right now, using Docker on YARN has some hard requirements. If these 
requirements are not met, then launching the containers will fail and and error 
message will be printed. Depending on how familiar the user is with Docker, it 
might or might not be easy for them to understand what went wrong and how to 
fix the underlying problem.

It would be important to explicitly document these requirements along with the 
error messages.

#1: CGroups handler cannot be systemd

If docker deamon runs with systemd cgroups handler, we receive the following 
error upon launching a container:

{noformat}
Container id: container_1561638268473_0006_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: 
cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
See '/usr/bin/docker-current run --help'.
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}

Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document 
a {{systemcl}} example.


#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container
Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. 
It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and 
there's only {{/bin/sh}}.

If we try to use these kind of images, we'll see the following error message:

{noformat}
Container id: container_1561638268473_0015_01_02
Exit code: 7
Exception message: Launch container failed
Shell error output: /usr/bin/docker-current: Error response from daemon: oci 
runtime error: container_linux.go:235: starting container process caused "exec: 
\"bash\": executable file not found in $PATH".
Shell output: main : command provided 4
main : run as user is johndoe
main : requested yarn user is johndoe
{noformat}

#3: {{find}} command must be available on the {{$PATH}}

It seems obvious that we have the {{find}} command, but even very popular 
images like {{fedora}} requires that we install it separately.

If we don't have {{find}} available, then {{launcher_container.sh}} fails with:

{noformat}
2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
 line 44: find: command not found
Last 4096 bytes of stderr.txt :
[2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh:
 line 44: find: command not found
Last 4096 bytes of stderr.txt :
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-07-01 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876128#comment-16876128
 ] 

Adam Antal commented on YARN-9629:
--

Ah indeed, thanks for the suggestion [~snemeth].
I must have missed it. Fixed it in patch v5.

> Support configurable MIN_LOG_ROLLING_INTERVAL
> -
>
> Key: YARN-9629
> URL: https://issues.apache.org/jira/browse/YARN-9629
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9629.001.patch, YARN-9629.002.patch, 
> YARN-9629.003.patch, YARN-9629.004.patch, YARN-9629.005.patch
>
>
> One of the log-aggregation parameter, the minimum valid value for 
> {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is 
> MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in 
> YARN-2583. 
> It has been empirically set as 1 hour, as lower values would too frequently 
> put the NodeManagers under pressure. For bigger clusters that is indeed a 
> valid limitation, but for smaller clusters it makes sense and a valid 
> customer usecase to use lower values, even like not so lower 30 mins. At this 
> point this can only be achieved by setting 
> {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be 
> kept as debug purposes.
> I'm suggesting to make this min configurable, although a warning should be 
> logged in the NodeManager startup when this value is lower than 1 hour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-07-01 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9629:
-
Attachment: YARN-9629.005.patch

> Support configurable MIN_LOG_ROLLING_INTERVAL
> -
>
> Key: YARN-9629
> URL: https://issues.apache.org/jira/browse/YARN-9629
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9629.001.patch, YARN-9629.002.patch, 
> YARN-9629.003.patch, YARN-9629.004.patch, YARN-9629.005.patch
>
>
> One of the log-aggregation parameter, the minimum valid value for 
> {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is 
> MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in 
> YARN-2583. 
> It has been empirically set as 1 hour, as lower values would too frequently 
> put the NodeManagers under pressure. For bigger clusters that is indeed a 
> valid limitation, but for smaller clusters it makes sense and a valid 
> customer usecase to use lower values, even like not so lower 30 mins. At this 
> point this can only be achieved by setting 
> {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be 
> kept as debug purposes.
> I'm suggesting to make this min configurable, although a warning should be 
> logged in the NodeManager startup when this value is lower than 1 hour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3221) Applications should be able to 're-register'

2019-07-01 Thread wangxiangchun (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876118#comment-16876118
 ] 

wangxiangchun commented on YARN-3221:
-

I encountered the same problem in yarn federation . when I enable the amrmProxy 
Ha ,I fail the first app attempt ,and it go to the sencond app attempt ,it has 
to register the UAM ,then this problem comes.

> Applications should be able to 're-register' 
> -
>
> Key: YARN-3221
> URL: https://issues.apache.org/jira/browse/YARN-3221
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sidharta Seethana
>Priority: Major
>
> Today, it is not possible for YARN applications to 're-register' in 
> failure/restart scenarios. This is especially problematic for Unmanaged 
> applications - when restarts (normal or otherwise) or other failures 
> necessitate the re-creation of the AMRMClient (along with a reset of the 
> internal RPC counter).  The YARN RM disallows an attempt to register again 
> (with the same saved token) with the following exception shown below.  This 
> should be fixed.
> {quote}
> rmClient.RegisterApplicationMaster 
> org.apache.hadoop.yarn.exceptions.InvalidApplicationMasterRequestException:Application
>  Master is already registered : application_1424304845861_0002
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:264)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9658) UT failures in TestLeafQueue

2019-07-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876089#comment-16876089
 ] 

Hadoop QA commented on YARN-9658:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
58s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 84m 
10s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9658 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973311/YARN-9658.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5f274ba9d9f3 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1e727cf |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24336/testReport/ |
| Max. process+thread count | 916 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24336/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> UT failures in TestLeafQueue
> 

[jira] [Commented] (YARN-9521) RM failed to start due to system services

2019-07-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876065#comment-16876065
 ] 

Hadoop QA commented on YARN-9521:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
47s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
45s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
55s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9521 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973315/YARN-9521.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f05d9d77fb39 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1e727cf |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 

[jira] [Commented] (YARN-9521) RM failed to start due to system services

2019-07-01 Thread kyungwan nam (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876007#comment-16876007
 ] 

kyungwan nam commented on YARN-9521:


I attached a new patch which ApiServiceClient.actionCleanUp will be performed 
with ugi.doAs()

> RM failed to start due to system services
> -
>
> Key: YARN-9521
> URL: https://issues.apache.org/jira/browse/YARN-9521
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: kyungwan nam
>Priority: Major
> Attachments: YARN-9521.001.patch, YARN-9521.002.patch
>
>
> when starting RM, listing system services directory has failed as follows.
> {code}
> 2019-04-30 17:18:25,441 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory 
> is configured to /services
> 2019-04-30 17:18:25,467 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation 
> initialized to yarn (auth:SIMPLE)
> 2019-04-30 17:18:25,467 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in 
> state STARTED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Filesystem closed
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501)
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1200)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> ... 13 more
> {code}
> it looks like due to the usage of filesystem cache.
> this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to 
> yarn-site



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9521) RM failed to start due to system services

2019-07-01 Thread kyungwan nam (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876006#comment-16876006
 ] 

kyungwan nam commented on YARN-9521:


after some further digging, I think that I figure out the cause of this issue 
more correctly.

normally, when yarn-service API is requested, a new ugi is created and it is 
performed inside of the ugi.doAs()
when calling FileSystem.get() inside of the ugi.doAs(), it always create a new 
FileSystem. because the ugi is used for the key of the FileSystem.CACHE. 
(YARN-3336 would be helpful to understand this)
so in this case, does not close a FileSystem from the FileSystem.CACHE
{code}
  UserGroupInformation ugi = getProxyUser(request);
  LOG.info("POST: createService = {} user = {}", service, ugi);
  if(service.getState()==ServiceState.STOPPED) {
ugi.doAs(new PrivilegedExceptionAction() {
  @Override
  public Void run() throws YarnException, IOException {
ServiceClient sc = getServiceClient();
try {
  sc.init(YARN_CONFIG);
  sc.start();
  sc.actionBuild(service);
} finally {
  sc.close();
}
return null;
  }
});
{code}

on the other hand, ApiServiceClient.actionCleanUp which is called at 
RMAppImpl.appAdminClientCleanUp is performed as the RM loginUser instead of 
doAs()
in this case, FileSystem.get() can return cached one which 
SystemServiceManagerImpl, FileSystemNodeLabelsStore refer
{code}
  @Override
  public int actionCleanUp(String appName, String userName) throws
  IOException, YarnException {
ServiceClient sc = new ServiceClient();
sc.init(getConfig());
sc.start();
int result = sc.actionCleanUp(appName, userName);
sc.close();
return result;
  }
{code}




> RM failed to start due to system services
> -
>
> Key: YARN-9521
> URL: https://issues.apache.org/jira/browse/YARN-9521
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: kyungwan nam
>Priority: Major
> Attachments: YARN-9521.001.patch, YARN-9521.002.patch
>
>
> when starting RM, listing system services directory has failed as follows.
> {code}
> 2019-04-30 17:18:25,441 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory 
> is configured to /services
> 2019-04-30 17:18:25,467 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation 
> initialized to yarn (auth:SIMPLE)
> 2019-04-30 17:18:25,467 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in 
> state STARTED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Filesystem closed
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501)
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233)
> at 
> 

[jira] [Updated] (YARN-9521) RM failed to start due to system services

2019-07-01 Thread kyungwan nam (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-9521:
---
Attachment: YARN-9521.002.patch

> RM failed to start due to system services
> -
>
> Key: YARN-9521
> URL: https://issues.apache.org/jira/browse/YARN-9521
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: kyungwan nam
>Priority: Major
> Attachments: YARN-9521.001.patch, YARN-9521.002.patch
>
>
> when starting RM, listing system services directory has failed as follows.
> {code}
> 2019-04-30 17:18:25,441 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory 
> is configured to /services
> 2019-04-30 17:18:25,467 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation 
> initialized to yarn (auth:SIMPLE)
> 2019-04-30 17:18:25,467 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in 
> state STARTED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Filesystem closed
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501)
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1200)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> ... 13 more
> {code}
> it looks like due to the usage of filesystem cache.
> this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to 
> yarn-site



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-07-01 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875997#comment-16875997
 ] 

Tao Yang commented on YARN-9623:


[~cheersyang], I have created YARN-9658 to fix these UT failures.

> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9623.001.patch, YARN-9623.002.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9658) UT failures in TestLeafQueue

2019-07-01 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9658:
---
Affects Version/s: 3.3.0

> UT failures in TestLeafQueue
> 
>
> Key: YARN-9658
> URL: https://issues.apache.org/jira/browse/YARN-9658
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-9658.001.patch
>
>
> In ActivitiesManager, if there's no yarn configuration in mock RMContext, 
> cleanup interval can't be initialized to 5 seconds by default, causing the 
> cleanup thread keeps running repeatedly without interval which may bring 
> problems to mockito framework, it caused OOM in this case, internally many 
> throwable objects were generated by incomplete mock.
> Add configuration for mock RMContext to fix failures in TestLeafQueue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9658) UT failures in TestLeafQueue

2019-07-01 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9658:
---
Attachment: YARN-9658.001.patch

> UT failures in TestLeafQueue
> 
>
> Key: YARN-9658
> URL: https://issues.apache.org/jira/browse/YARN-9658
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-9658.001.patch
>
>
> In ActivitiesManager, if there's no yarn configuration in mock RMContext, 
> cleanup interval can't be initialized to 5 seconds by default, causing the 
> cleanup thread keeps running repeatedly without interval which may bring 
> problems to mockito framework, it caused OOM in this case, internally many 
> throwable objects were generated by incomplete mock.
> Add configuration for mock RMContext to fix failures in TestLeafQueue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9658) UT failures in TestLeafQueue

2019-07-01 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9658:
---
Description: 
In ActivitiesManager, if there's no yarn configuration in mock RMContext, 
cleanup interval can't be initialized to 5 seconds by default, causing the 
cleanup thread keeps running repeatedly without interval which may bring 
problems to mockito framework, it caused OOM in this case, internally many 
throwable objects were generated by incomplete mock.
Add configuration for mock RMContext to fix failures in TestLeafQueue.

  was:
In ActivitiesManager, if there's no yarn configuration in mock RMContext, 
cleanup interval can't be initialized to 5 seconds by default, causing the 
cleanup thread keeps running repeatedly without interval which may bring 
problems to mockito framework, it caused OOM in this case, internally many 
throwable objects were generated by incomplete mock.
Add a default value for ActivitiesManager#activitiesCleanupIntervalMs to avoid 
cleanup thread running repeatedly without interval.


> UT failures in TestLeafQueue
> 
>
> Key: YARN-9658
> URL: https://issues.apache.org/jira/browse/YARN-9658
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-9658.001.patch
>
>
> In ActivitiesManager, if there's no yarn configuration in mock RMContext, 
> cleanup interval can't be initialized to 5 seconds by default, causing the 
> cleanup thread keeps running repeatedly without interval which may bring 
> problems to mockito framework, it caused OOM in this case, internally many 
> throwable objects were generated by incomplete mock.
> Add configuration for mock RMContext to fix failures in TestLeafQueue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9659) yarn application cannot be killed after updating info for attempt failed

2019-07-01 Thread zhangqw (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangqw updated YARN-9659:
--
Description: 
Affected by HDFS:
{code:java}
Not enough replicas was chosen. Reason:{NOT_ENOUGH_STORAGE_SPACE=2} 
{code}
updating info for attempt failed:
{code:java}
2019-06-28 10:36:57,917 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:updateApplicationAttemptStateInternal(464)) - 
Updating info for attempt: appattempt_1561517363839_0013_01 at: 
/tmp/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1561

517363839_0013/appattempt_1561517363839_0013_01

2019-06-28 10:36:57,931 INFO  hdfs.DataStreamer 
(DataStreamer.java:createBlockOutputStream(1789)) - Exception in 
createBlockOutputStream blk_1088382064_14942741

java.io.IOException: Got error, status=ERROR, status message , ack with 
firstBadLink as 10.0.96.36:50010

        at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)

        at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)

        at 
org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1778)

        at 
org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)

        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
{code}
in RM log:
{code:java}
2019-06-28 10:36:57,953 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:runWithRetries(743)) - Maxed out FS retries. 
Giving up!
...
...
2019-06-28 10:49:28,746 INFO  util.AbstractLivelinessMonitor 
(AbstractLivelinessMonitor.java:run(148)) - 
Expired:appattempt_1561517363839_0013_01 Timed out after 600 secs
{code}
Now application cannot be killed:
{code:java}
19/07/01 15:22:55 INFO impl.YarnClientImpl: Waiting for application 
application_1561517363839_0013 to be killed.
{code}
And when access container info page in rm web ui, error 500 returned. 

RM log:
{code:java}
2019-06-28 10:24:00,176 ERROR webapp.Dispatcher (Dispatcher.java:service(171)) 
- error handling URI: /cluster/appattempt/appattempt_1561517363839_0011_04
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor253.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
...
...
Caused by: java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.FINAL_SAVING
at java.lang.Enum.valueOf(Enum.java:238)
at 
org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createAttemptHeadRoomTable(RMAppAttemptBlock.java:197)
at 
org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:151)
at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58){code}
 

 Notice related issue has been patched: YARN-8183

  was:
Affected by HDFS:
{code:java}
Not enough replicas was chosen. Reason:{NOT_ENOUGH_STORAGE_SPACE=2} 
{code}
updating info for attempt failed:
{code:java}
2019-06-28 10:36:57,917 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:updateApplicationAttemptStateInternal(464)) - 
Updating info for attempt: appattempt_1561517363839_0013_01 at: 
/tmp/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1561

517363839_0013/appattempt_1561517363839_0013_01

2019-06-28 10:36:57,931 INFO  hdfs.DataStreamer 
(DataStreamer.java:createBlockOutputStream(1789)) - Exception in 
createBlockOutputStream blk_1088382064_14942741

java.io.IOException: Got error, status=ERROR, status message , ack with 
firstBadLink as 10.0.96.36:50010

        at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)

        at 

[jira] [Created] (YARN-9659) yarn application cannot be killed after updating info for attempt failed

2019-07-01 Thread zhangqw (JIRA)
zhangqw created YARN-9659:
-

 Summary: yarn application cannot be killed after updating info for 
attempt failed
 Key: YARN-9659
 URL: https://issues.apache.org/jira/browse/YARN-9659
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
 Environment: Hadoop 3.1.1 release

Centos 7.1
Reporter: zhangqw


Affected by HDFS:
{code:java}
Not enough replicas was chosen. Reason:{NOT_ENOUGH_STORAGE_SPACE=2} 
{code}
updating info for attempt failed:
{code:java}
2019-06-28 10:36:57,917 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:updateApplicationAttemptStateInternal(464)) - 
Updating info for attempt: appattempt_1561517363839_0013_01 at: 
/tmp/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1561

517363839_0013/appattempt_1561517363839_0013_01

2019-06-28 10:36:57,931 INFO  hdfs.DataStreamer 
(DataStreamer.java:createBlockOutputStream(1789)) - Exception in 
createBlockOutputStream blk_1088382064_14942741

java.io.IOException: Got error, status=ERROR, status message , ack with 
firstBadLink as 10.0.96.36:50010

        at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)

        at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)

        at 
org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1778)

        at 
org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)

        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
{code}
in RM log:
{code:java}
2019-06-28 10:36:57,953 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:runWithRetries(743)) - Maxed out FS retries. 
Giving up!
...
...
2019-06-28 10:49:28,746 INFO  util.AbstractLivelinessMonitor 
(AbstractLivelinessMonitor.java:run(148)) - 
Expired:appattempt_1561517363839_0013_01 Timed out after 600 secs
{code}
Now application cannot be killed:
{code:java}
19/07/01 15:22:55 INFO impl.YarnClientImpl: Waiting for application 
application_1561517363839_0013 to be killed.
{code}
And when access container info page in rm web ui, error 500 returned. 

RM log:
{code:java}
2019-06-28 10:24:00,176 ERROR webapp.Dispatcher (Dispatcher.java:service(171)) 
- error handling URI: /cluster/appattempt/appattempt_1561517363839_0011_04
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor253.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
...
...
Caused by: java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.FINAL_SAVING
at java.lang.Enum.valueOf(Enum.java:238)
at 
org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createAttemptHeadRoomTable(RMAppAttemptBlock.java:197)
at 
org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:151)
at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58){code}
 

 Notice related issue has patched: YARN-8183



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9658) UT failures in TestLeafQueue

2019-07-01 Thread Tao Yang (JIRA)
Tao Yang created YARN-9658:
--

 Summary: UT failures in TestLeafQueue
 Key: YARN-9658
 URL: https://issues.apache.org/jira/browse/YARN-9658
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tao Yang
Assignee: Tao Yang


In ActivitiesManager, if there's no yarn configuration in mock RMContext, 
cleanup interval can't be initialized to 5 seconds by default, causing the 
cleanup thread keeps running repeatedly without interval which may bring 
problems to mockito framework, it caused OOM in this case, internally many 
throwable objects were generated by incomplete mock.
Add a default value for ActivitiesManager#activitiesCleanupIntervalMs to avoid 
cleanup thread running repeatedly without interval.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-07-01 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875986#comment-16875986
 ] 

Weiwei Yang commented on YARN-9623:
---

Hi [~Tao Yang], pls create a new issue to fix this failure. Thanks

 

> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9623.001.patch, YARN-9623.002.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-07-01 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875976#comment-16875976
 ] 

Szilard Nemeth commented on YARN-9629:
--

Hi [~adam.antal]!
Thanks for the update!

The code changes you made for patch004 regarding documentation in 
yarn-default.xml looks good to me.
One minor thing left: Could you please remove any "suggested" references 
(variable names, log messages) in method 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService#calculateRollingMonitorInterval?
 

Thanks!

> Support configurable MIN_LOG_ROLLING_INTERVAL
> -
>
> Key: YARN-9629
> URL: https://issues.apache.org/jira/browse/YARN-9629
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9629.001.patch, YARN-9629.002.patch, 
> YARN-9629.003.patch, YARN-9629.004.patch
>
>
> One of the log-aggregation parameter, the minimum valid value for 
> {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is 
> MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in 
> YARN-2583. 
> It has been empirically set as 1 hour, as lower values would too frequently 
> put the NodeManagers under pressure. For bigger clusters that is indeed a 
> valid limitation, but for smaller clusters it makes sense and a valid 
> customer usecase to use lower values, even like not so lower 30 mins. At this 
> point this can only be achieved by setting 
> {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be 
> kept as debug purposes.
> I'm suggesting to make this min configurable, although a warning should be 
> logged in the NodeManager startup when this value is lower than 1 hour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9657) AbstractLivelinessMonitor add serviceName to PingChecker thread

2019-07-01 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-9657:
--

 Summary: AbstractLivelinessMonitor add serviceName to PingChecker 
thread
 Key: YARN-9657
 URL: https://issues.apache.org/jira/browse/YARN-9657
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bibin A Chundatt






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org