[jira] [Commented] (YARN-10229) [Federation] Client should be able to submit application to RM directly using normal client conf

2020-07-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168334#comment-17168334
 ] 

Hadoop QA commented on YARN-10229:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
18s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 
14s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/27/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10229 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13006679/YARN-10229.008.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 2466c29a4332 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 05b3337a460 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
|  Test Results | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/27/testReport/ |
| Max. process+thread count | 424 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 

[jira] [Commented] (YARN-10229) [Federation] Client should be able to submit application to RM directly using normal client conf

2020-07-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168317#comment-17168317
 ] 

Íñigo Goiri commented on YARN-10229:


+1 on  [^YARN-10229.008.patch].

> [Federation] Client should be able to submit application to RM directly using 
> normal client conf
> 
>
> Key: YARN-10229
> URL: https://issues.apache.org/jira/browse/YARN-10229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy, federation
>Affects Versions: 3.1.1
>Reporter: JohnsonGuo
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10229.001.patch, YARN-10229.002.patch, 
> YARN-10229.003.patch, YARN-10229.004.patch, YARN-10229.005.patch, 
> YARN-10229.006.patch, YARN-10229.007.patch, YARN-10229.008.patch
>
>
> Scenario: When enable the yarn federation feature with multi yarn clusters, 
> one can submit their job to yarn-router by *modified* their client 
> configuration with yarn router address.
> But if one still wants to submit their jobs via the original client (before 
> enable federation) to RM directly, it will encounter the AMRMToken exception. 
>  That means once enable federation ,if some one want to submit job, they have 
> to  modify the client conf.
>  
> one possible solution for this Scenario is:
> In NodeManger, when the client ApplicationMaster request comes:
>  * get the client job.xml  from HDFS "".
>  * parse the "yarn.resourcemanager.scheduler.address" parameter in job.xml
>  * if the value of the parameter is "localhost:8049"(AMRM address),then do 
> the AMRMToken valid process
>  * if the value of the parameter is "rm:port"(rm address),then skip the 
> AMRMToken valid process
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM

2020-07-30 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168213#comment-17168213
 ] 

Jim Brennan commented on YARN-1529:
---

Thanks [~epayne]!

> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 3.2.2, 2.10.1, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, 
> YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, 
> YARN-1529.v03.patch, YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM

2020-07-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168212#comment-17168212
 ] 

Hadoop QA commented on YARN-1529:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
45s{color} | {color:green} branch-2.10 passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  3m 
57s{color} | {color:red} hadoop-yarn in branch-2.10 failed with JDK Oracle 
Corporation-1.7.0_95-b00. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
7s{color} | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} branch-2.10 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-yarn-api in branch-2.10 failed with JDK Oracle 
Corporation-1.7.0_95-b00. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-yarn-server-nodemanager in branch-2.10 failed 
with JDK Oracle Corporation-1.7.0_95-b00. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
16s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
55s{color} | {color:green} branch-2.10 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
46s{color} | {color:green} the patch passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
20s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 14s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 535 unchanged - 0 fixed = 537 total (was 535) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
37s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdkOracleCorporation-1.7.0_95-b00
 with JDK Oracle Corporation-1.7.0_95-b00 generated 2 new + 0 unchanged - 0 
fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
25s{color} | {color:green} the patch passed {color} |
|| || 

[jira] [Comment Edited] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource

2020-07-30 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168205#comment-17168205
 ] 

Eric Payne edited comment on YARN-4575 at 7/30/20, 8:45 PM:


I'm not sure why 2 pre-commit builds are being triggered. Nevertheless, the 
unit tests are not failing for me and I think the TestFairSchedulerPreemption 
failure is YARN-9333. None of the others fail for me locally.


was (Author: eepayne):
I'm not sure why 2 pre-commit builds are being triggered. Nevertheless, the 
unit tests are not failing for me and I think the TestFairSchedulerPreemption 
failure is YARN-9333.

> ApplicationResourceUsageReport should return ALL  reserved resource
> ---
>
> Key: YARN-4575
> URL: https://issues.apache.org/jira/browse/YARN-4575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-4575.patch, 0002-YARN-4575.patch, 
> YARN-4575.003.patch, YARN-4575.004.patch
>
>
> ApplicationResourceUsageReport reserved resource report  is only of default 
> parition should be of all partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource

2020-07-30 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168205#comment-17168205
 ] 

Eric Payne commented on YARN-4575:
--

I'm not sure why 2 pre-commit builds are being triggered. Nevertheless, the 
unit tests are not failing for me and I think the TestFairSchedulerPreemption 
failure is YARN-9333.

> ApplicationResourceUsageReport should return ALL  reserved resource
> ---
>
> Key: YARN-4575
> URL: https://issues.apache.org/jira/browse/YARN-4575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-4575.patch, 0002-YARN-4575.patch, 
> YARN-4575.003.patch, YARN-4575.004.patch
>
>
> ApplicationResourceUsageReport reserved resource report  is only of default 
> parition should be of all partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM

2020-07-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168204#comment-17168204
 ] 

Hadoop QA commented on YARN-1529:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
58s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
50s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
15s{color} | {color:green} branch-2.10 passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
4s{color} | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} branch-2.10 passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} branch-2.10 passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
13s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
47s{color} | {color:green} branch-2.10 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
34s{color} | {color:green} the patch passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
57s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  5s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 535 unchanged - 0 fixed = 537 total (was 535) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed with JDK Oracle 
Corporation-1.7.0_95-b00 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~16.04-b09 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 
16s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| 

[jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM

2020-07-30 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-1529:
-
Fix Version/s: 3.1.5
   3.3.1
   3.4.0
   2.10.1
   3.2.2

> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 3.2.2, 2.10.1, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, 
> YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, 
> YARN-1529.v03.patch, YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM

2020-07-30 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168161#comment-17168161
 ] 

Jim Brennan commented on YARN-1529:
---

[~epayne] I have uploaded a patch for branch-2.10.  Incidentally, the 
compilation error was related to the fact that [YARN-7677] has not been pulled 
back to branch-2.10.  We might want to consider doing that.


> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, 
> YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, 
> YARN-1529.v03.patch, YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1529) Add Localization overhead metrics to NM

2020-07-30 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-1529:
--
Attachment: YARN-1529-branch-2.10.001.patch

> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-1529-branch-2.10.001.patch, YARN-1529.005.patch, 
> YARN-1529.006.patch, YARN-1529.v01.patch, YARN-1529.v02.patch, 
> YARN-1529.v03.patch, YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM

2020-07-30 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168136#comment-17168136
 ] 

Jim Brennan commented on YARN-1529:
---

Thanks [~epayne]!  I will put up a patch for branch-2.10.

> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-1529.005.patch, YARN-1529.006.patch, 
> YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, 
> YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM

2020-07-30 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168122#comment-17168122
 ] 

Eric Payne commented on YARN-1529:
--

I don't know why 2 pre-commit builds were kicked off. The first was fine but 
the second one had several unit test failures. Those unit tests all succeed for 
me locally.

I have committed to branch-3.1 to trunk.

However, although there were no merge conflicts in backporting to 2.10, the 
following code does not compile:
{code:title=ContainerLaunch#sanitizeEnv}
addToEnvMap(environment, nmVars, Environment.LOCALIZATION_COUNTERS.name(),
 container.localizationCountersAsString());
{code}

> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-1529.005.patch, YARN-1529.006.patch, 
> YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, 
> YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10380) Import logic of multi-node allocation in CapacityScheduler

2020-07-30 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168116#comment-17168116
 ] 

Wangda Tan commented on YARN-10380:
---

cc: [~prabhujoseph], I think we identified more issues during a debug session. 
I saw YARN-10360 is filed, but I think there're more issues, do you remember? 

Also + [~sunil.gov...@gmail.com], [~tangzhankun]. 

I checked logics of other parts, I didn't see too many other issues, but I 
didn't spend much time on this so it is possible I missed something. 

> Import logic of multi-node allocation in CapacityScheduler
> --
>
> Key: YARN-10380
> URL: https://issues.apache.org/jira/browse/YARN-10380
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wangda Tan
>Priority: Critical
>
> *1) Entry point:* 
> When we do multi-node allocation, we're using the same logic of async 
> scheduling:
> {code:java}
> // Allocate containers of node [start, end)
>  for (FiCaSchedulerNode node : nodes) {
>   if (current++ >= start) {
>      if (shouldSkipNodeSchedule(node, cs, printSkipedNodeLogging)) {
>         continue;
>      }
>      cs.allocateContainersToNode(node.getNodeID(), false);
>   }
>  } {code}
> Is it the most effective way to do multi-node scheduling? Should we allocate 
> based on partitions? In above logic, if we have thousands of node in one 
> partition, we will repeatly access all nodes of the partition thousands of 
> times.
> I would suggest looking at making entry-point for node-heartbeat, 
> async-scheduling (single node), and async-scheduling (multi-node) to be 
> different.
> Node-heartbeat and async-scheduling (single node) can be still similar and 
> share most of the code. 
> async-scheduling (multi-node): should iterate partition first, using pseudo 
> code like: 
> {code:java}
> for (partition : all partitions) {
>   allocateContainersOnMultiNodes(getCandidate(partition))
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10380) Import logic of multi-node allocation in CapacityScheduler

2020-07-30 Thread Wangda Tan (Jira)
Wangda Tan created YARN-10380:
-

 Summary: Import logic of multi-node allocation in CapacityScheduler
 Key: YARN-10380
 URL: https://issues.apache.org/jira/browse/YARN-10380
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wangda Tan


*1) Entry point:* 
When we do multi-node allocation, we're using the same logic of async 
scheduling:
{code:java}
// Allocate containers of node [start, end)
 for (FiCaSchedulerNode node : nodes) {
  if (current++ >= start) {
     if (shouldSkipNodeSchedule(node, cs, printSkipedNodeLogging)) {
        continue;
     }
     cs.allocateContainersToNode(node.getNodeID(), false);
  }
 } {code}
Is it the most effective way to do multi-node scheduling? Should we allocate 
based on partitions? In above logic, if we have thousands of node in one 
partition, we will repeatly access all nodes of the partition thousands of 
times.

I would suggest looking at making entry-point for node-heartbeat, 
async-scheduling (single node), and async-scheduling (multi-node) to be 
different.

Node-heartbeat and async-scheduling (single node) can be still similar and 
share most of the code. 

async-scheduling (multi-node): should iterate partition first, using pseudo 
code like: 
{code:java}
for (partition : all partitions) {
  allocateContainersOnMultiNodes(getCandidate(partition))
} {code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10379) Refactor ContainerExecutor exit code Exception handling

2020-07-30 Thread Benjamin Teke (Jira)
Benjamin Teke created YARN-10379:


 Summary: Refactor ContainerExecutor exit code Exception handling
 Key: YARN-10379
 URL: https://issues.apache.org/jira/browse/YARN-10379
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Benjamin Teke
Assignee: Benjamin Teke


**Currently every time a shell command is executed and returns with a non-zero 
exitcode an exception gets thrown. But along the call tree this exception gets 
catched, after some info/warn logging and other processing steps rethrown, 
possibly packaged to another exception. For example:
 * from PrivilegedOperationExecutor.executePrivilegedOperation - 
ExitCodeException catch (as IOException), PrivilegedOperationException thrown
 * then in LinuxContainerExecutor.startLocalizer - PrivilegedOperationException 
catch, exitCode collection, logging, IOException rethrown
 * then in ResourceLocalizationService.run - generic Exception catch, but there 
is a TODO for separate ExitCodeException handling, however that information is 
only present here in an error message string

This flow could be simplified and unified in the different executors. For 
example use one specific exception till the last possible step, catch it only 
where it is necessary and keep the exitcode as it could be used later in the 
process. This change could help with maintainability and readability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1529) Add Localization overhead metrics to NM

2020-07-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168064#comment-17168064
 ] 

Hudson commented on YARN-1529:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18481 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18481/])
YARN-1529: Add Localization overhead metrics to NM. Contributed by (ericp: rev 
e0c9653166df48a47267dbc81d124ab78267e039)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerResourceLocalizedEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/MockContainer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java


> Add Localization overhead metrics to NM
> ---
>
> Key: YARN-1529
> URL: https://issues.apache.org/jira/browse/YARN-1529
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-1529.005.patch, YARN-1529.006.patch, 
> YARN-1529.v01.patch, YARN-1529.v02.patch, YARN-1529.v03.patch, 
> YARN-1529.v04.patch
>
>
> Users are often unaware of localization cost that their jobs incur. To 
> measure effectiveness of localization caches it is necessary to expose the 
> overhead in the form of metrics.
> We propose addition of the following metrics to NodeManagerMetrics.
> When a container is about to launch, its set of LocalResources has to be 
> fetched from a central location, typically on HDFS, that results in a number 
> of download requests for the files missing in caches.
> LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache 
> misses.
> LocalizedFilesCached: total localization requests that were served from local 
> caches. Cache hits.
> LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.
> LocalizedBytesCached: total bytes satisfied from local caches.
> Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
> were served out of cache: ratio = 100 * caches / (caches + misses)
> LocalizationDownloadNanos: total elapsed time in nanoseconds for a container 
> to go from ResourceRequestTransition to LocalizedTransition



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted

2020-07-30 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-4783:
---
Attachment: YARN-4783.001.patch

> Log aggregation failure for application when Nodemanager is restarted 
> --
>
> Key: YARN-4783
> URL: https://issues.apache.org/jira/browse/YARN-4783
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-4783.001.patch
>
>
> Scenario :
> =
> 1.Start NM with user dsperf:hadoop
> 2.Configure linux-execute user as dsperf
> 3.Submit application with yarn user 
> 4.Once few containers are allocated to NM 1
> 5.Nodemanager 1 is stopped  (wait for expiry )
> 6.Start node manager after application is completed
> 7.Check the log aggregation is happening for the containers log in NMLocal 
> directory
> Expect Output :
> ===
> Log aggregation should be succesfull
> Actual Output :
> ===
> Log aggreation not successfull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted

2020-07-30 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori reopened YARN-4783:


I am reopening this issue in order to find a less invasive approach on how to 
handle this corner case, since it was reported a long time ago and still has 
not been resolved yet.
Uploaded a new patch without a test case for now.

The main idea is to try to renew the token stored in the application 
credentials, on an application state transition from NEW to INITING. If the 
renewal process is successful, the token is valid and nothing needs to be done 
from the application's point of view. However, if the renewal is failed with 
InvalidToken error, we request a new one on behalf of the user.

In case of a token request, it is now the application's responsibility to clean 
it up, when the corresponding operations are done, therefore it is canceled 
when the log aggregation is finished.

> Log aggregation failure for application when Nodemanager is restarted 
> --
>
> Key: YARN-4783
> URL: https://issues.apache.org/jira/browse/YARN-4783
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Andras Gyori
>Priority: Major
>
> Scenario :
> =
> 1.Start NM with user dsperf:hadoop
> 2.Configure linux-execute user as dsperf
> 3.Submit application with yarn user 
> 4.Once few containers are allocated to NM 1
> 5.Nodemanager 1 is stopped  (wait for expiry )
> 6.Start node manager after application is completed
> 7.Check the log aggregation is happening for the containers log in NMLocal 
> directory
> Expect Output :
> ===
> Log aggregation should be succesfull
> Actual Output :
> ===
> Log aggreation not successfull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-4783) Log aggregation failure for application when Nodemanager is restarted

2020-07-30 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori reassigned YARN-4783:
--

Assignee: Andras Gyori

> Log aggregation failure for application when Nodemanager is restarted 
> --
>
> Key: YARN-4783
> URL: https://issues.apache.org/jira/browse/YARN-4783
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Andras Gyori
>Priority: Major
>
> Scenario :
> =
> 1.Start NM with user dsperf:hadoop
> 2.Configure linux-execute user as dsperf
> 3.Submit application with yarn user 
> 4.Once few containers are allocated to NM 1
> 5.Nodemanager 1 is stopped  (wait for expiry )
> 6.Start node manager after application is completed
> 7.Check the log aggregation is happening for the containers log in NMLocal 
> directory
> Expect Output :
> ===
> Log aggregation should be succesfull
> Actual Output :
> ===
> Log aggreation not successfull



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9136) getNMResourceInfo NodeManager REST API method is not documented

2020-07-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167782#comment-17167782
 ] 

Hadoop QA commented on YARN-9136:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue}  0m  
0s{color} | {color:blue} markdownlint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
37m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26327/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-9136 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008739/YARN-9136.002.patch |
| Optional Tests | dupname asflicense mvnsite markdownlint |
| uname | Linux e20d3254f4b3 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / cf4eb756085 |
| Max. process+thread count | 308 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/26327/console |
| versions | git=2.17.1 maven=3.6.0 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> getNMResourceInfo NodeManager REST API method is not documented
> ---
>
> Key: YARN-9136
> URL: https://issues.apache.org/jira/browse/YARN-9136
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Hudáky Márton Gyula
>Priority: Major
> Attachments: YARN-9136.001.patch, YARN-9136.002.patch
>
>
> I cannot find documentation for the resources endpoint in NMWebServices: 
> /ws/v1/node/resources/\{resourcename\}
> I looked in the file NodeManagerRest.md for documentation but haven't found 
> any.
> This is supposedly unintentionally not documented: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9136) getNMResourceInfo NodeManager REST API method is not documented

2020-07-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167780#comment-17167780
 ] 

Hadoop QA commented on YARN-9136:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
53s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} markdownlint {color} | {color:blue}  0m  
1s{color} | {color:blue} markdownlint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
34m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/25/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-9136 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008739/YARN-9136.002.patch |
| Optional Tests | dupname asflicense mvnsite markdownlint |
| uname | Linux 985367a2636f 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / cf4eb756085 |
| Max. process+thread count | 433 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/25/console |
| versions | git=2.17.1 maven=3.6.0 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> getNMResourceInfo NodeManager REST API method is not documented
> ---
>
> Key: YARN-9136
> URL: https://issues.apache.org/jira/browse/YARN-9136
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Hudáky Márton Gyula
>Priority: Major
> Attachments: YARN-9136.001.patch, YARN-9136.002.patch
>
>
> I cannot find documentation for the resources endpoint in NMWebServices: 
> /ws/v1/node/resources/\{resourcename\}
> I looked in the file NodeManagerRest.md for documentation but haven't found 
> any.
> This is supposedly unintentionally not documented: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10378) When NM goes down and comes back up, PC allocation tags are not removed for completed containers

2020-07-30 Thread Tarun Parimi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarun Parimi resolved YARN-10378.
-
Resolution: Duplicate

Looks like YARN-10034 fixes this issue for NM going down scenario also. Closing 
as duplicate.

> When NM goes down and comes back up, PC allocation tags are not removed for 
> completed containers
> 
>
> Key: YARN-10378
> URL: https://issues.apache.org/jira/browse/YARN-10378
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
>
> We are using placement constaints anti-affinity in an application along with 
> node label. The application requests two containers with anti affinity on the 
> node label containing only two nodes.
> So two containers will be allocated in the two nodes, one on each node 
> satisfying anti-affinity.
> When one nodemanager goes down for some time, the node is marked as lost by 
> RM and then it will kill all containers in that node.
> The AM will now have one pending container request, since the previous 
> container got killed.
> When the Nodemanager becomes up after some time, the pending container is not 
> getting allocated in that node again and the application has to wait forever 
> for that container.
> If the ResourceManager is restarted, this issue disappears and the container 
> gets allocated on the NodeManager which came back up recently.
> This seems to be an issue with the allocation tags not removed.
> The allocation tag is added for the container 
> container_e68_1595886973474_0005_01_03 .
> {code:java}
> 2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager 
> (AllocationTagsManager.java:addContainer(355)) - Added 
> container=container_e68_1595886973474_0005_01_03 with tags=[hbase]\
> {code}
> However, the allocation tag is not removed when the container 
> container_e68_1595886973474_0005_01_03 is released. There is no 
> equivalent DEBUG message seen for removing tags. This means that the tags are 
> not getting removed. If the tag is not removed, then scheduler will not 
> allocate in the same node due to anti-affinity resulting in the issue 
> observed.
> {code:java}
> 2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler 
> (AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container 
> FINISHED: container_e68_1595886973474_0005_01_03
> 2020-07-28 17:19:34,353 INFO  scheduler.AbstractYarnScheduler 
> (AbstractYarnScheduler.java:completedContainer(669)) - Container 
> container_e68_1595886973474_0005_01_03 completed with event FINISHED, but 
> corresponding RMContainer doesn't exist.
> {code}
> This seems to be due to changes done in YARN-8511 . Change here was made to 
> remove the tags only after NM confirms container is released. However, in our 
> scenario this is not happening. So the tag will never get removed until RM 
> restart.
> Reverting YARN-8511 fixes this particular issue and tags are getting removed. 
> But this is not a valid solution since the problem that YARN-8511 solves is 
> also valid. We need to find a solution which does not break YARN-8511 and 
> also fixes this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9136) getNMResourceInfo NodeManager REST API method is not documented

2020-07-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hudáky Márton Gyula updated YARN-9136:
--
Attachment: YARN-9136.002.patch

> getNMResourceInfo NodeManager REST API method is not documented
> ---
>
> Key: YARN-9136
> URL: https://issues.apache.org/jira/browse/YARN-9136
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Hudáky Márton Gyula
>Priority: Major
> Attachments: YARN-9136.001.patch, YARN-9136.002.patch
>
>
> I cannot find documentation for the resources endpoint in NMWebServices: 
> /ws/v1/node/resources/\{resourcename\}
> I looked in the file NodeManagerRest.md for documentation but haven't found 
> any.
> This is supposedly unintentionally not documented: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerRest.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10378) When NM goes down and comes back up, PC allocation tags are not removed for completed containers

2020-07-30 Thread Tarun Parimi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarun Parimi updated YARN-10378:

Description: 
We are using placement constaints anti-affinity in an application along with 
node label. The application requests two containers with anti affinity on the 
node label containing only two nodes.

So two containers will be allocated in the two nodes, one on each node 
satisfying anti-affinity.

When one nodemanager goes down for some time, the node is marked as lost by RM 
and then it will kill all containers in that node.

The AM will now have one pending container request, since the previous 
container got killed.

When the Nodemanager becomes up after some time, the pending container is not 
getting allocated in that node again and the application has to wait forever 
for that container.

If the ResourceManager is restarted, this issue disappears and the container 
gets allocated on the NodeManager which came back up recently.

This seems to be an issue with the allocation tags not removed.

The allocation tag is added for the container 
container_e68_1595886973474_0005_01_03 .
{code:java}
2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager 
(AllocationTagsManager.java:addContainer(355)) - Added 
container=container_e68_1595886973474_0005_01_03 with tags=[hbase]\
{code}
However, the allocation tag is not removed when the container 
container_e68_1595886973474_0005_01_03 is released. There is no equivalent 
DEBUG message seen for removing tags. This means that the tags are not getting 
removed. If the tag is not removed, then scheduler will not allocate in the 
same node due to anti-affinity resulting in the issue observed.
{code:java}
2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container 
FINISHED: container_e68_1595886973474_0005_01_03
2020-07-28 17:19:34,353 INFO  scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:completedContainer(669)) - Container 
container_e68_1595886973474_0005_01_03 completed with event FINISHED, but 
corresponding RMContainer doesn't exist.
{code}
This seems to be due to changes done in YARN-8511 . Change here was made to 
remove the tags only after NM confirms container is released. However, in our 
scenario this is not happening. So the tag will never get removed until RM 
restart.

Reverting YARN-8511 fixes this particular issue and tags are getting removed. 
But this is not a valid solution since the problem that YARN-8511 solves is 
also valid. We need to find a solution which does not break YARN-8511 and also 
fixes this issue.

  was:
We are using placement constaints anti-affinity in an application along with 
node label. The application requests two containers with anti affinity on the 
node label containing only two nodes.

So two containers will be allocated in the two nodes, one on each node 
satisfying anti-affinity.

When one nodemanager goes down for some time, the node is marked as lost by RM 
and then it will kill all containers in that node.

The AM will now have one pending container request, since the previous 
container got killed.

When the Nodemanager becomes up after some time, the pending container is not 
getting allocated in that node again and the application has to wait forever 
for that container.

If the ResourceManager is restarted, this issue disappears and the container 
gets allocated on the NodeManager which came back up recently.

This seems to be an issue with the allocation tags not removed.

The allocation tag is added for the container 
container_e68_1595886973474_0005_01_03 .
{code:java}
2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager 
(AllocationTagsManager.java:addContainer(355)) - Added 
container=container_e68_1595886973474_0005_01_03 with tags=[hbase]\
{code}
However, the allocation tag is not removed when the container 
container_e68_1595886973474_0005_01_03 is released. There is no equivalent 
DEBUG message seen for removing tags. This means that the tags are not getting 
removed. If the tag is not removed, then scheduler will not allocate in the 
same node resulting in the issue observed.
{code:java}
2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container 
FINISHED: container_e68_1595886973474_0005_01_03
2020-07-28 17:19:34,353 INFO  scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:completedContainer(669)) - Container 
container_e68_1595886973474_0005_01_03 completed with event FINISHED, but 
corresponding RMContainer doesn't exist.
{code}
This seems to be due to changes done in YARN-8511 . Change here was made to 
remove the tags only after NM confirms container is released. However, in our 
scenario this is not happening. So the tag will never get removed until RM 

[jira] [Created] (YARN-10378) When NM goes down and comes back up, PC allocation tags are not removed for completed containers

2020-07-30 Thread Tarun Parimi (Jira)
Tarun Parimi created YARN-10378:
---

 Summary: When NM goes down and comes back up, PC allocation tags 
are not removed for completed containers
 Key: YARN-10378
 URL: https://issues.apache.org/jira/browse/YARN-10378
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 3.1.1, 3.2.0
Reporter: Tarun Parimi
Assignee: Tarun Parimi


We are using placement constaints anti-affinity in an application along with 
node label. The application requests two containers with anti affinity on the 
node label containing only two nodes.

So two containers will be allocated in the two nodes, one on each node 
satisfying anti-affinity.

When one nodemanager goes down for some time, the node is marked as lost by RM 
and then it will kill all containers in that node.

The AM will now have one pending container request, since the previous 
container got killed.

When the Nodemanager becomes up after some time, the pending container is not 
getting allocated in that node again and the application has to wait forever 
for that container.

If the ResourceManager is restarted, this issue disappears and the container 
gets allocated on the NodeManager which came back up recently.

This seems to be an issue with the allocation tags not removed.

The allocation tag is added for the container 
container_e68_1595886973474_0005_01_03 .
{code:java}
2020-07-28 17:02:04,091 DEBUG constraint.AllocationTagsManager 
(AllocationTagsManager.java:addContainer(355)) - Added 
container=container_e68_1595886973474_0005_01_03 with tags=[hbase]\
{code}
However, the allocation tag is not removed when the container 
container_e68_1595886973474_0005_01_03 is released. There is no equivalent 
DEBUG message seen for removing tags. This means that the tags are not getting 
removed. If the tag is not removed, then scheduler will not allocate in the 
same node resulting in the issue observed.
{code:java}
2020-07-28 17:19:34,353 DEBUG scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:updateCompletedContainers(1038)) - Container 
FINISHED: container_e68_1595886973474_0005_01_03
2020-07-28 17:19:34,353 INFO  scheduler.AbstractYarnScheduler 
(AbstractYarnScheduler.java:completedContainer(669)) - Container 
container_e68_1595886973474_0005_01_03 completed with event FINISHED, but 
corresponding RMContainer doesn't exist.
{code}
This seems to be due to changes done in YARN-8511 . Change here was made to 
remove the tags only after NM confirms container is released. However, in our 
scenario this is not happening. So the tag will never get removed until RM 
restart.

Reverting YARN-8511 fixes this particular issue and tags are getting removed. 
But this is not a valid solution since the problem that YARN-8511 solves is 
also valid. We need to find a solution which does not break YARN-8511 and also 
fixes this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10363) TestRMAdminCLI.testHelp is failing in branch-2.10

2020-07-30 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167658#comment-17167658
 ] 

Bilwa S T commented on YARN-10363:
--

Thanks [~Jim_Brennan] for review. I think checkstyle issue can be ignored.

> TestRMAdminCLI.testHelp is failing in branch-2.10
> -
>
> Key: YARN-10363
> URL: https://issues.apache.org/jira/browse/YARN-10363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.1
>Reporter: Jim Brennan
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10363-branch-2.10.patch
>
>
> TestRMAdminCLI.testHelp is failing in branch-2.10.
> Example failure:
> {noformat}
> ---
> Test set: org.apache.hadoop.yarn.client.cli.TestRMAdminCLI
> ---
> Tests run: 31, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 18.668 s <<< 
> FAILURE! - in org.apache.hadoop.yarn.client.cli.TestRMAdminCLI
> testHelp(org.apache.hadoop.yarn.client.cli.TestRMAdminCLI)  Time elapsed: 
> 0.043 s  <<< FAILURE!
> java.lang.AssertionError: 
> Expected error message: 
> Usage: yarn rmadmin [-failover [--forcefence] [--forceactive]  
> ] is not included in messages: 
> Usage: yarn rmadmin
>-refreshQueues 
>-refreshNodes [-g|graceful [timeout in seconds] -client|server]
>-refreshNodesResources 
>-refreshSuperUserGroupsConfiguration 
>-refreshUserToGroupsMappings 
>-refreshAdminAcls 
>-refreshServiceAcl 
>-getGroups [username]
>-addToClusterNodeLabels 
> <"label1(exclusive=true),label2(exclusive=false),label3">
>-removeFromClusterNodeLabels  (label splitted by ",")
>-replaceLabelsOnNode <"node1[:port]=label1,label2 
> node2[:port]=label1,label2"> [-failOnUnknownNodes] 
>-directlyAccessNodeLabelStore 
>-refreshClusterMaxPriority 
>-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout])
>-help [cmd]
> Generic options supported are:
> -conf specify an application configuration file
> -Ddefine a value for a given property
> -fs  specify default filesystem URL to use, 
> overrides 'fs.defaultFS' property from configurations.
> -jt   specify a ResourceManager
> -files specify a comma-separated list of files to 
> be copied to the map reduce cluster
> -libjarsspecify a comma-separated list of jar files 
> to be included in the classpath
> -archives   specify a comma-separated list of archives 
> to be unarchived on the compute machines
> The general command line syntax is:
> command [genericOptions] [commandOptions]
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.yarn.client.cli.TestRMAdminCLI.testError(TestRMAdminCLI.java:859)
>   at 
> org.apache.hadoop.yarn.client.cli.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:585)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> 

[jira] [Commented] (YARN-10282) CLONE - hadoop-yarn-server-nodemanager build failed: make failed with error code 2

2020-07-30 Thread wangxiangchun (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167654#comment-17167654
 ] 

wangxiangchun commented on YARN-10282:
--

hi ,

how did you solve the problem , I encountered the same problem in 3.3.0. 

but when I build the 3.2.1  in the same linux os, it is ok. 

this is the error info.

 

[^[[1;31mERROR^[[m] Failed to execute goal 
^[[32morg.apache.hadoop:hadoop-maven-plugins:3.3.0:cmake-compile^[[m 
^[[1m(cmake-compile)^[[m on project ^[[36mhadoop-yarn-server-nodemanager^[[m: 
^[[1;31mmake failed with error code 2^[[m -> ^[[1m[Help 1]^[[m
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
^[[32morg.apache.hadoop:hadoop-maven-plugins:3.3.0:cmake-compile^[[m 
^[[1m(cmake-compile)^[[m on project ^[[36mhadoop-yarn-server-nodemanager^[[m: 
^[[1;31mmake failed with error code 2^[[m
 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
 at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
 at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
 at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
 at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
 at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
 at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
 at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
 at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
 at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
 at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
 at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.MojoExecutionException: make failed with 
error code 2
 at 
org.apache.hadoop.maven.plugin.cmakebuilder.CompileMojo.runMake(CompileMojo.java:229)
 at 
org.apache.hadoop.maven.plugin.cmakebuilder.CompileMojo.execute(CompileMojo.java:98)
 at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
 at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)

> CLONE - hadoop-yarn-server-nodemanager build failed: make failed with error 
> code 2
> --
>
> Key: YARN-10282
> URL: https://issues.apache.org/jira/browse/YARN-10282
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: lynsey
>Priority: Blocker
>
> when i compile hadoop-3.2.0 release,i encountered the following errors:
> [ERROR] Failed to execute goal 
> org.apache.hadoop:hadoop-maven-plugins:3.2.0:cmake-compile (cmake-compile) on 
> project hadoop-yarn-server-nodemanager: make failed with error code 2 -> 
> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.hadoop:hadoop-maven-plugins:3.2.0:cmake-compile 
> (cmake-compile) on project hadoop-yarn-server-nodemanager: make failed with 
> error code 2
>  at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
>  at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>  at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>  at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>  at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
>  at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
>  at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
>  at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
>  at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
>  at