[jira] [Commented] (YARN-8380) Support shared mounts in docker runtime

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548795#comment-16548795
 ] 

genericqa commented on YARN-8380:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
23s{color} | {color:red} hadoop-yarn-site in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 23s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.TestNvidiaDockerV1CommandPlugin
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8380 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-18 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548789#comment-16548789
 ] 

Rohith Sharma K S commented on YARN-8501:
-

Thanks [~snemeth] for working on this patch. This is nice refactoring. Could 
you also make a change in AHSWebServices#getApps?

> Reduce complexity of RMWebServices' getApps method
> --
>
> Key: YARN-8501
> URL: https://issues.apache.org/jira/browse/YARN-8501
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8501.001.patch, YARN-8501.002.patch, 
> YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548779#comment-16548779
 ] 

genericqa commented on YARN-8548:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
14s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8548 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932179/YARN-8548-002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 266cf4d5a758 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ba1ab08 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21292/testReport/ |
| Max. process+thread count | 397 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21292/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> 

[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548751#comment-16548751
 ] 

genericqa commented on YARN-8501:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  4m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
15s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m 
17s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8501 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931991/YARN-8501.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ee81fb21d519 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ba1ab08 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21291/testReport/ |
| Max. process+thread count | 933 (vs. ulimit of 1) |
| 

[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548703#comment-16548703
 ] 

genericqa commented on YARN-8330:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 64m  
6s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}127m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8330 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932168/YARN-8330.2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a884f3f35524 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ba1ab08 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21290/testReport/ |
| Max. process+thread count | 830 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21290/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> An extra container got launched by RM for 

[jira] [Commented] (YARN-8550) YARN root queue exceeds 100%

2018-07-18 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548685#comment-16548685
 ] 

Weiwei Yang commented on YARN-8550:
---

I have observed a similar issue in YARN-8546. That was caused by two 
allocations both take room from a reserved container causing queue usage 
exceeds 100%. But that was when async scheduling enabled 
("yarn.scheduler.capacity.schedule-asynchronously.enabled=true"), not in 
v2.7.3. Not sure if they are the same issue.

> YARN root queue exceeds 100%
> 
>
> Key: YARN-8550
> URL: https://issues.apache.org/jira/browse/YARN-8550
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Priority: Major
> Attachments: Screen Shot 2018-07-13 at 1.42.41 PM.png
>
>
> YARN root queue usage is more than 100% which is misleading. (attached 
> screenshot) This happens when there is a container reserved and so used + 
> reserved exceeds Total. Cluster is configured with CPU Scheduling.
> {code}
> 2018-07-17 13:27:59,569 INFO  capacity.ParentQueue 
> (ParentQueue.java:assignContainers(475)) - assignedContainer queue=root 
> usedCapacity=0.9713542 absoluteUsedCapacity=0.9713542 used= vCores:83> cluster=
> 2018-07-17 13:27:59,627 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(422)) - 
> container_e56_1531419441577_2045_01_03 Container Transitioned from NEW to 
> RESERVED
> 2018-07-17 13:27:59,627 INFO  allocator.AbstractContainerAllocator 
> (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(77)) - 
> Reserved container  application=application_1531419441577_2045 
> resource=
>  
> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@2a1563f4
>  cluster=
> 2018-07-17 13:27:59,627 INFO  capacity.ParentQueue 
> (ParentQueue.java:assignContainers(475)) - assignedContainer queue=root 
> usedCapacity=1.0390625 absoluteUsedCapacity=1.0390625 used= vCores:85> cluster=
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-18 Thread Bilwa S T (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548673#comment-16548673
 ] 

Bilwa S T commented on YARN-8548:
-

Hi [~bibinchundatt]. You are correct. It should be invoked at the starting of 
method. I have updated the patch

> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-8548-001.patch, YARN-8548-002.patch
>
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-18 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-8548:

Attachment: YARN-8548-002.patch

> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-8548-001.patch, YARN-8548-002.patch
>
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8380) Support shared mounts in docker runtime

2018-07-18 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8380:
-
Attachment: YARN-8380.1.patch

> Support shared mounts in docker runtime
> ---
>
> Key: YARN-8380
> URL: https://issues.apache.org/jira/browse/YARN-8380
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8380.1.patch
>
>
> The docker run command supports the mount type shared, but currently we are 
> only supporting ro and rw mount types in the docker runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-18 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548603#comment-16548603
 ] 

Suma Shivaprasad commented on YARN-8330:


Attached patch which publishes container creation events in case of 
allocated/acquired state transitions instead of in the RMContainerImpl 
constructor/setContainerId calls

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch, YARN-8330.2.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-18 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8330:
---
Attachment: YARN-8330.2.patch

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch, YARN-8330.2.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8551) Build Common module for MaWo application

2018-07-18 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora reassigned YARN-8551:


Assignee: Yesha Vora

> Build Common module for MaWo application
> 
>
> Key: YARN-8551
> URL: https://issues.apache.org/jira/browse/YARN-8551
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
>
> Build Common module for MaWo application.
>  This module should include defination of Task. A Task should contain
>  * TaskID
>  * Task Command
>  * Task Environment
>  * Task Timeout
>  * Task Type
>  ** Simple Task
>  *** Its a single Task
>  ** Composite Task
>  *** Its a composition of multiple simple tasks
>  ** Teardown Task
>  *** Its a last task to be executed after a job is finished
>  ** Null Task
>  *** Its a null task



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-18 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548546#comment-16548546
 ] 

Robert Kanter edited comment on YARN-6966 at 7/18/18 11:01 PM:
---

It looks like the job is broken even though it's up now.  All of the tests from 
today ran for < 1 min before failing :(

I think it's fine if making a testcase for that will be too tricky.

+1 LGM pending Jenkins
[~haibochen] any other comments?


was (Author: rkanter):
It looks like the job is broken even though it's up now.  All of the tests from 
today ran for < 1 min before failing :(

+1 LGM pending Jenkins
[~haibochen] any other comments?

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-18 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548546#comment-16548546
 ] 

Robert Kanter commented on YARN-6966:
-

It looks like the job is broken even though it's up now.  All of the tests from 
today ran for < 1 min before failing :(

+1 LGM pending Jenkins
[~haibochen] any other comments?

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548486#comment-16548486
 ] 

genericqa commented on YARN-8330:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
8s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8330 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932145/YARN-8330.1.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21285/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548484#comment-16548484
 ] 

genericqa commented on YARN-8548:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
8s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8548 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932079/YARN-8548-001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21286/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-8548-001.patch
>
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548490#comment-16548490
 ] 

genericqa commented on YARN-8529:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
9s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8529 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932013/YARN-8529.v2.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21288/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add timeout to RouterWebServiceUtil#invokeRMWebService
> --
>
> Key: YARN-8529
> URL: https://issues.apache.org/jira/browse/YARN-8529
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8529.v1.patch, YARN-8529.v2.patch
>
>
> {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. 
> This should be configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548487#comment-16548487
 ] 

genericqa commented on YARN-8301:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
8s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8301 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932153/YARN-8301.004.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21287/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-18 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha reassigned YARN-8429:
---

Assignee: Gour Saha

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8551) Build Common module for MaWo application

2018-07-18 Thread Yesha Vora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-8551:
-
Description: 
Build Common module for MaWo application.
 This module should include defination of Task. A Task should contain
 * TaskID
 * Task Command
 * Task Environment
 * Task Timeout
 * Task Type
 ** Simple Task
 *** Its a single Task
 ** Composite Task
 *** Its a composition of multiple simple tasks
 ** Teardown Task
 *** Its a last task to be executed after a job is finished
 ** Null Task
 *** Its a null task

  was:
Build Common module for  MaWo application.
This module should include defination of Task. A Task should contain
* TaskID
* Task Command
* Task Environment
* Task Timeout 
* Task Type
** Simple Task
*** Its a single Task 
** Composite Task
*** Its a composition of multiple simple tasks
** Die Task
*** Its a last task to be executed after a job is finished
** Null Task
*** Its a null task






> Build Common module for MaWo application
> 
>
> Key: YARN-8551
> URL: https://issues.apache.org/jira/browse/YARN-8551
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Yesha Vora
>Priority: Major
>
> Build Common module for MaWo application.
>  This module should include defination of Task. A Task should contain
>  * TaskID
>  * Task Command
>  * Task Environment
>  * Task Timeout
>  * Task Type
>  ** Simple Task
>  *** Its a single Task
>  ** Composite Task
>  *** Its a composition of multiple simple tasks
>  ** Teardown Task
>  *** Its a last task to be executed after a job is finished
>  ** Null Task
>  *** Its a null task



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8551) Build Common module for MaWo application

2018-07-18 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8551:


 Summary: Build Common module for MaWo application
 Key: YARN-8551
 URL: https://issues.apache.org/jira/browse/YARN-8551
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora


Build Common module for  MaWo application.
This module should include defination of Task. A Task should contain
* TaskID
* Task Command
* Task Environment
* Task Timeout 
* Task Type
** Simple Task
*** Its a single Task 
** Composite Task
*** Its a composition of multiple simple tasks
** Die Task
*** Its a last task to be executed after a job is finished
** Null Task
*** Its a null task







--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8542:

Description: 
GET app/v1/services/{{service-name}}/component-instances returns a list of 
containers with YARN-8299.
{code:java}
[
{
"id": "container_1531508836237_0001_01_03",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509014497,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-1"
},
{
"id": "container_1531508836237_0001_01_02",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509013492,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-0"
}
]{code}
{{component_name}} is not part of container json, so it is hard to tell which 
component an instance belongs to. 
To fix this, will change the format of returned containers to:
{code:java}
[
  {
"name": "ping",
"containers": [
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "ping-0",
"hostname": "ping-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_02",
"ip": "172.26.111.21",
"launch_time": 1531767377301,
"state": "READY"
  },
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "ping-1",
"hostname": "ping-1.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_07",
"ip": "172.26.111.21",
"launch_time": 1531767410395,
"state": "RUNNING_BUT_UNREADY"
  }
]
  },
  {
"name": "sleep",
"containers": [
  {
"bare_host": "eyang-5.openstacklocal",
"component_instance_name": "sleep-0",
"hostname": "sleep-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_04",
"ip": "172.26.111.20",
"launch_time": 1531767377710,
"state": "READY"
  },
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "sleep-1",
"hostname": "sleep-1.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_05",
"ip": "172.26.111.21",
"launch_time": 1531767378303,
"state": "READY"
  }
]
  }
]{code}
 

  was:
GET app/v1/services/{\{service-name}}/component-instances returns a list of 
containers with YARN-8299.
{code:java}
[
{
"id": "container_1531508836237_0001_01_03",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509014497,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-1"
},
{
"id": "container_1531508836237_0001_01_02",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509013492,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-0"
}
]{code}
{{component_name}} is not part of container json, so it is hard to tell which 
component an instance belongs to. 
Change the list of containers return 


> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> GET app/v1/services/{{service-name}}/component-instances returns a list of 
> containers with YARN-8299.
> {code:java}
> [
> {
> "id": "container_1531508836237_0001_01_03",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509014497,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-1"
> },
> {
> "id": "container_1531508836237_0001_01_02",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509013492,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-0"
> }
> ]{code}
> {{component_name}} is not part of container json, so it is hard to tell which 
> component an instance belongs to. 
> To fix this, will change the format of returned containers to:
> {code:java}
> [
>   {
> "name": "ping",
> "containers": [
>   {
> "bare_host": "eyang-4.openstacklocal",
> "component_instance_name": "ping-0",
> "hostname": "ping-0.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_02",
> "ip": "172.26.111.21",
> "launch_time": 1531767377301,
> "state": "READY"
>   },
>   {
> "bare_host": "eyang-4.openstacklocal",
> "component_instance_name": "ping-1",
> "hostname": "ping-1.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_07",
> "ip": "172.26.111.21",
> "launch_time": 1531767410395,
> "state": "RUNNING_BUT_UNREADY"
>   }
> ]
>   },
>   {
>  

[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-18 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548426#comment-16548426
 ] 

Bibin A Chundatt commented on YARN-8548:


[~BilwaST]

As per current patch {{maybeInitBuilder()}} will be invoked only if the 
nmTokens are empty of null.
 Please make sure is its called for all cases. Moved to start of method call.

> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-8548-001.patch
>
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548417#comment-16548417
 ] 

Gour Saha commented on YARN-8301:
-

Great. Patch 4 looks good. Not sure why I see the trailing whitespaces when I 
apply the patch. The jenkins build should tell us. +1 for 004 pending jenkins.

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548405#comment-16548405
 ] 

Chandni Singh commented on YARN-8301:
-

Addressed [~gsaha] comments in patch 4. 

I didn't find many trailing whitespaces. Let me know if you still see them.

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8301:

Attachment: YARN-8301.004.patch

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-18 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8330:
---
Comment: was deleted

(was: Attached patch which calls SystemMetricsPblisher.containerCreated in 
ContainerStartedTransition instead of the constructor.)

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-18 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8330:
---
Attachment: YARN-8330.1.patch

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-18 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8330:
---
Attachment: (was: YARN-8330.1.patch)

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-18 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8330:
---
Attachment: (was: YARN-8330.1.patch)

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-18 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8330:
---
Attachment: YARN-8330.1.patch

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8330) An extra container got launched by RM for yarn-service

2018-07-18 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8330:
---
Attachment: YARN-8330.1.patch

> An extra container got launched by RM for yarn-service
> --
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8330.1.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_01
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-IdStart Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e06_1525463491331_0006_01_02Fri May 04 22:34:26 + 2018  
>  N/A RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_02/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_03
> Fri May 04 22:34:26 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_03/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_01
> Fri May 04 22:34:15 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_01/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_05
> Fri May 04 22:34:56 + 2018   N/A 
> RUNNINGxxx:25454  http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_05/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_04
> Fri May 04 22:34:56 + 2018   N/A
> nullxxx:25454  http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_04/container_e06_1525463491331_0006_01_04/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_04 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_04 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548384#comment-16548384
 ] 

Chandni Singh commented on YARN-8301:
-

{quote}
 In line 148 do we need the line "name": "sleeper-service" in the JSON spec for 
version 1.0.1 of the service.
{quote}
No, will remove it. 

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8542:

Description: 
GET app/v1/services/{\{service-name}}/component-instances returns a list of 
containers with YARN-8299.
{code:java}
[
{
"id": "container_1531508836237_0001_01_03",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509014497,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-1"
},
{
"id": "container_1531508836237_0001_01_02",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509013492,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-0"
}
]{code}
{{component_name}} is not part of container json, so it is hard to tell which 
component an instance belongs to. 
Change the list of containers return 

  was:
GET app/v1/services/{\{service-name}}/component-instances returns a list of 
containers with YARN-8299.
{code:java}
[
{
"id": "container_1531508836237_0001_01_03",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509014497,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-1"
},
{
"id": "container_1531508836237_0001_01_02",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509013492,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-0"
}
]{code}
{{component_name}} is not part of container json, so it is hard to tell which 
component an instance belongs to. 


> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> GET app/v1/services/{\{service-name}}/component-instances returns a list of 
> containers with YARN-8299.
> {code:java}
> [
> {
> "id": "container_1531508836237_0001_01_03",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509014497,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-1"
> },
> {
> "id": "container_1531508836237_0001_01_02",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509013492,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-0"
> }
> ]{code}
> {{component_name}} is not part of container json, so it is hard to tell which 
> component an instance belongs to. 
> Change the list of containers return 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548379#comment-16548379
 ] 

Chandni Singh commented on YARN-8542:
-

[~gsaha] Ok. That sounds reasonable. Will change it to the format you have 
proposed.

> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> GET app/v1/services/{\{service-name}}/component-instances returns a list of 
> containers with YARN-8299.
> {code:java}
> [
> {
> "id": "container_1531508836237_0001_01_03",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509014497,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-1"
> },
> {
> "id": "container_1531508836237_0001_01_02",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509013492,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-0"
> }
> ]{code}
> {{component_name}} is not part of container json, so it is hard to tell which 
> component an instance belongs to. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548375#comment-16548375
 ] 

Gour Saha commented on YARN-8542:
-

[~csingh] agreed that the API is to request for containers. However, the 
structure I proposed adheres to the current status API structure and the 
swagger definition. Note, service owners are already parsing through the 
component instances across multiple components in the status response payload 
if they need a single collection of all component instances. If you add a new 
attribute "component_name" now, you would need to modify the swagger definition 
and it would actually mean a change for the end-users since they would have to 
handle the containers API output differently from the status API output.

Let me know what you think.

> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> GET app/v1/services/{\{service-name}}/component-instances returns a list of 
> containers with YARN-8299.
> {code:java}
> [
> {
> "id": "container_1531508836237_0001_01_03",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509014497,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-1"
> },
> {
> "id": "container_1531508836237_0001_01_02",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509013492,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-0"
> }
> ]{code}
> {{component_name}} is not part of container json, so it is hard to tell which 
> component an instance belongs to. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548360#comment-16548360
 ] 

Gour Saha commented on YARN-8301:
-

[~csingh], patch 2 looks good. Let's add to the top of this doc - "Experimental 
Feature - Tech Preview" and create a reference to it from Overview.md (and also 
mention it there that it is an Experimental Feature - Tech Preview). Thanks 
[~eyang] for pointing this out.

Few minor comments -
1. In line 148 do we need the line "name": "sleeper-service" in the JSON spec 
for version 1.0.1 of the service.
2. Remove the trailing whitespaces from all the lines

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8301:

Attachment: YARN-8301.003.patch

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548358#comment-16548358
 ] 

Chandni Singh commented on YARN-8301:
-

Addressed offline comments in patch 3

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8550) YARN root queue exceeds 100%

2018-07-18 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-8550:
---

 Summary: YARN root queue exceeds 100%
 Key: YARN-8550
 URL: https://issues.apache.org/jira/browse/YARN-8550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.3
Reporter: Prabhu Joseph
 Attachments: Screen Shot 2018-07-13 at 1.42.41 PM.png

YARN root queue usage is more than 100% which is misleading. (attached 
screenshot) This happens when there is a container reserved and so used + 
reserved exceeds Total. Cluster is configured with CPU Scheduling.

{code}
2018-07-17 13:27:59,569 INFO  capacity.ParentQueue 
(ParentQueue.java:assignContainers(475)) - assignedContainer queue=root 
usedCapacity=0.9713542 absoluteUsedCapacity=0.9713542 used= cluster=

2018-07-17 13:27:59,627 INFO  rmcontainer.RMContainerImpl 
(RMContainerImpl.java:handle(422)) - container_e56_1531419441577_2045_01_03 
Container Transitioned from NEW to RESERVED
2018-07-17 13:27:59,627 INFO  allocator.AbstractContainerAllocator 
(AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(77)) - 
Reserved container  application=application_1531419441577_2045 
resource=

 
queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@2a1563f4
 cluster=
2018-07-17 13:27:59,627 INFO  capacity.ParentQueue 
(ParentQueue.java:assignContainers(475)) - assignedContainer queue=root 
usedCapacity=1.0390625 absoluteUsedCapacity=1.0390625 used= cluster=

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8301:

Attachment: YARN-8301.002.patch

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8542:

Description: 
GET app/v1/services/{\{service-name}}/component-instances returns a list of 
containers with YARN-8299.
{code:java}
[
{
"id": "container_1531508836237_0001_01_03",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509014497,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-1"
},
{
"id": "container_1531508836237_0001_01_02",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509013492,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-0"
}
]{code}
{{component_name}} is not part of container json, so it is hard to tell which 
component an instance belongs to. 

  was:
In YARN-8299, CLI for query container status is implemented to display 
containers in a flat list.  It might be helpful to display component structure 
hierarchy like this:

{code}
[
  {
"name": "ping",
"containers": [
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "ping-0",
"hostname": "ping-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_02",
"ip": "172.26.111.21",
"launch_time": 1531767377301,
"state": "READY"
  },
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "ping-1",
"hostname": "ping-1.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_07",
"ip": "172.26.111.21",
"launch_time": 1531767410395,
"state": "RUNNING_BUT_UNREADY"
  }
]
  },
  {
"name": "sleep",
"containers": [
  {
"bare_host": "eyang-5.openstacklocal",
"component_instance_name": "sleep-0",
"hostname": "sleep-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_04",
"ip": "172.26.111.20",
"launch_time": 1531767377710,
"state": "READY"
  },
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "sleep-1",
"hostname": "sleep-1.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_05",
"ip": "172.26.111.21",
"launch_time": 1531767378303,
"state": "READY"
  }
]
  }
]
{code}


> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> GET app/v1/services/{\{service-name}}/component-instances returns a list of 
> containers with YARN-8299.
> {code:java}
> [
> {
> "id": "container_1531508836237_0001_01_03",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509014497,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-1"
> },
> {
> "id": "container_1531508836237_0001_01_02",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509013492,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-0"
> }
> ]{code}
> {{component_name}} is not part of container json, so it is hard to tell which 
> component an instance belongs to. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548248#comment-16548248
 ] 

Chandni Singh commented on YARN-8542:
-

[~gsaha] 

I am not in favor of the below format:
{code:java}
{
"name": "sleep",
"containers": [
  {
"bare_host": "eyang-5.openstacklocal",
"component_instance_name": "sleep-0",
"hostname": "sleep-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_04",
"ip": "172.26.111.20",
"launch_time": 1531767377710,
"state": "READY"
  }
}{code}
It doesn't follow the convention. The request is for containers, so it should 
return a list of containers. I prefer adding component_name to the container 
json.

Also it is easy for users to further filter a flat list instead of a nested 
json.

> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> In YARN-8299, CLI for query container status is implemented to display 
> containers in a flat list.  It might be helpful to display component 
> structure hierarchy like this:
> {code}
> [
>   {
> "name": "ping",
> "containers": [
>   {
> "bare_host": "eyang-4.openstacklocal",
> "component_instance_name": "ping-0",
> "hostname": "ping-0.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_02",
> "ip": "172.26.111.21",
> "launch_time": 1531767377301,
> "state": "READY"
>   },
>   {
> "bare_host": "eyang-4.openstacklocal",
> "component_instance_name": "ping-1",
> "hostname": "ping-1.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_07",
> "ip": "172.26.111.21",
> "launch_time": 1531767410395,
> "state": "RUNNING_BUT_UNREADY"
>   }
> ]
>   },
>   {
> "name": "sleep",
> "containers": [
>   {
> "bare_host": "eyang-5.openstacklocal",
> "component_instance_name": "sleep-0",
> "hostname": "sleep-0.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_04",
> "ip": "172.26.111.20",
> "launch_time": 1531767377710,
> "state": "READY"
>   },
>   {
> "bare_host": "eyang-4.openstacklocal",
> "component_instance_name": "sleep-1",
> "hostname": "sleep-1.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_05",
> "ip": "172.26.111.21",
> "launch_time": 1531767378303,
> "state": "READY"
>   }
> ]
>   }
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-18 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548234#comment-16548234
 ] 

Zian Chen commented on YARN-8501:
-

[~snemeth] , sorry for the late review. Basically the builder is what I thought 
should be used to clean up the logic here. The latest patch LGTM. +1

> Reduce complexity of RMWebServices' getApps method
> --
>
> Key: YARN-8501
> URL: https://issues.apache.org/jira/browse/YARN-8501
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8501.001.patch, YARN-8501.002.patch, 
> YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-18 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548196#comment-16548196
 ] 

Zian Chen commented on YARN-8522:
-

Build infrastructure is broken by HADOOP-15610. Tests will be triggered when 
that issues is addressed.

> Application fails with InvalidResourceRequestException
> --
>
> Key: YARN-8522
> URL: https://issues.apache.org/jira/browse/YARN-8522
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8522.001.patch
>
>
> Launch multiple streaming app simultaneously. Here, sometimes one of the 
> application fails with below stack trace.
> {code}
> 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: 
> java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to 
> xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying 
> after sleeping for 3ms.
> 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception: 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>  on [rm2], so propagating back to caller.
> 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1530515284077_0007
> 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration

2018-07-18 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548190#comment-16548190
 ] 

Jonathan Hung commented on YARN-7974:
-

Seems related to HADOOP-15610:{noformat}Collecting typed_ast; python_version < 
"3.7" and implementation_name == "cpython" (from astroid>=2.0.0->pylint)
  Downloading 
https://files.pythonhosted.org/packages/52/cf/2ebc7d282f026e21eed4987e42e10964a077c13cfc168b42f3573a7f178c/typed-ast-1.1.0.tar.gz
 (200kB)
Complete output from command python setup.py egg_info:
Error: typed_ast only runs on Python 3.3 and above.


Command "python setup.py egg_info" failed with error code 1 in 
/tmp/pip-build-QpIUX5/typed-ast/
You are using pip version 8.1.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The command '/bin/sh -c pip2 install pylint' returned a non-zero code: 1

Total Elapsed time:  13m  3s

ERROR: Docker failed to build image.{noformat}

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch, 
> YARN-7974.003.patch, YARN-7974.004.patch, YARN-7974.005.patch, 
> YARN-7974.006.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Approach is for AM to update tracking url on heartbeat to RM, and add related 
> API in AMRMClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8547) rm may crash if nm register with too many applications

2018-07-18 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548181#comment-16548181
 ] 

Giovanni Matteo Fumarola commented on YARN-8547:


Thanks [~sandflee]  for working on this.
Can you provide more details on the cause and the consequences?

> rm may crash if nm register with too many applications
> --
>
> Key: YARN-8547
> URL: https://issues.apache.org/jira/browse/YARN-8547
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
>Priority: Major
> Attachments: YARN-8547.01.patch
>
>
> 1, our cluster had n k+ nodes, and disabled log aggregation, one single nm 
> may keeps 1w+ apps 
> 2, when rm failover, single nm register with 1w+ apps, causing active rm 
> always gc and lost connection with zk.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2

2018-07-18 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548174#comment-16548174
 ] 

Giovanni Matteo Fumarola commented on YARN-8549:


Thanks [~prabham] . Can you call the patches YARN-\{Jira number}.v\{incremental 
number}.patch?
e.g YARN-8549.v1.patch

> No operation timeline writer and reader plugin classes for ATSv2
> 
>
> Key: YARN-8549
> URL: https://issues.apache.org/jira/browse/YARN-8549
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineclient, timelineserver
>Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2
>Reporter: Prabha Manepalli
>Priority: Minor
> Fix For: YARN-2928, YARN-5355, YARN-5355_branch2
>
> Attachments: TimeLineReaderAndWriterStubs.patch
>
>
> Stub implementation for TimeLineReader and TimeLineWriter classes. 
> These are useful for functional testing of writer and reader path for ATSv2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8538) Fix valgrind leak check on container executor

2018-07-18 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548169#comment-16548169
 ] 

Billie Rinaldi commented on YARN-8538:
--

Thanks [~eyang] and [~bibinchundatt]!

> Fix valgrind leak check on container executor
> -
>
> Key: YARN-8538
> URL: https://issues.apache.org/jira/browse/YARN-8538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8538.1.patch, YARN-8538.2.patch
>
>
> Running valgrind --leak-check=yes ./cetest gives us this:
> {noformat}
> ==14094== LEAK SUMMARY:
> ==14094==    definitely lost: 964,351 bytes in 1,154 blocks
> ==14094==    indirectly lost: 75,506 bytes in 3,777 blocks
> ==14094==  possibly lost: 0 bytes in 0 blocks
> ==14094==    still reachable: 554 bytes in 22 blocks
> ==14094== suppressed: 0 bytes in 0 blocks
> ==14094== Reachable blocks (those to which a pointer was found) are not shown.
> ==14094== To see them, rerun with: --leak-check=full --show-leak-kinds=all
> ==14094==
> ==14094== For counts of detected and suppressed errors, rerun with: -v
> ==14094== ERROR SUMMARY: 373 errors from 373 contexts (suppressed: 0 from 0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8538) Fix valgrind leak check on container executor

2018-07-18 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548167#comment-16548167
 ] 

Eric Yang commented on YARN-8538:
-

[~billie.rinaldi] [~bibinchundatt], I cherry-picked to branch 3.1.  Thanks for 
the feedbacks.

> Fix valgrind leak check on container executor
> -
>
> Key: YARN-8538
> URL: https://issues.apache.org/jira/browse/YARN-8538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8538.1.patch, YARN-8538.2.patch
>
>
> Running valgrind --leak-check=yes ./cetest gives us this:
> {noformat}
> ==14094== LEAK SUMMARY:
> ==14094==    definitely lost: 964,351 bytes in 1,154 blocks
> ==14094==    indirectly lost: 75,506 bytes in 3,777 blocks
> ==14094==  possibly lost: 0 bytes in 0 blocks
> ==14094==    still reachable: 554 bytes in 22 blocks
> ==14094== suppressed: 0 bytes in 0 blocks
> ==14094== Reachable blocks (those to which a pointer was found) are not shown.
> ==14094== To see them, rerun with: --leak-check=full --show-leak-kinds=all
> ==14094==
> ==14094== For counts of detected and suppressed errors, rerun with: -v
> ==14094== ERROR SUMMARY: 373 errors from 373 contexts (suppressed: 0 from 0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8538) Fix valgrind leak check on container executor

2018-07-18 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8538:

Fix Version/s: 3.1.1

> Fix valgrind leak check on container executor
> -
>
> Key: YARN-8538
> URL: https://issues.apache.org/jira/browse/YARN-8538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8538.1.patch, YARN-8538.2.patch
>
>
> Running valgrind --leak-check=yes ./cetest gives us this:
> {noformat}
> ==14094== LEAK SUMMARY:
> ==14094==    definitely lost: 964,351 bytes in 1,154 blocks
> ==14094==    indirectly lost: 75,506 bytes in 3,777 blocks
> ==14094==  possibly lost: 0 bytes in 0 blocks
> ==14094==    still reachable: 554 bytes in 22 blocks
> ==14094== suppressed: 0 bytes in 0 blocks
> ==14094== Reachable blocks (those to which a pointer was found) are not shown.
> ==14094== To see them, rerun with: --leak-check=full --show-leak-kinds=all
> ==14094==
> ==14094== For counts of detected and suppressed errors, rerun with: -v
> ==14094== ERROR SUMMARY: 373 errors from 373 contexts (suppressed: 0 from 0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548152#comment-16548152
 ] 

genericqa commented on YARN-8549:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
7s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8549 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932091/TimeLineReaderAndWriterStubs.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21284/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> No operation timeline writer and reader plugin classes for ATSv2
> 
>
> Key: YARN-8549
> URL: https://issues.apache.org/jira/browse/YARN-8549
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineclient, timelineserver
>Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2
>Reporter: Prabha Manepalli
>Priority: Minor
> Fix For: YARN-2928, YARN-5355, YARN-5355_branch2
>
> Attachments: TimeLineReaderAndWriterStubs.patch
>
>
> Stub implementation for TimeLineReader and TimeLineWriter classes. 
> These are useful for functional testing of writer and reader path for ATSv2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8436) FSParentQueue: Comparison method violates its general contract

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548144#comment-16548144
 ] 

genericqa commented on YARN-8436:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
7s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8436 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932098/YARN-8436.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21283/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> FSParentQueue: Comparison method violates its general contract
> --
>
> Key: YARN-8436
> URL: https://issues.apache.org/jira/browse/YARN-8436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Attachments: YARN-8436.001.patch, YARN-8436.002.patch, 
> YARN-8436.003.patch
>
>
> The ResourceManager can fail while sorting queues if an update comes in:
> {code:java}
> FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>   at java.util.TimSort.mergeLo(TimSort.java:777)
>   at java.util.TimSort.mergeAt(TimSort.java:514)
> ...
>   at java.util.Collections.sort(Collections.java:175)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:223){code}
> The reason it breaks is a change in the sorted object itself. 
> This is why it fails:
>  * an update from a node comes in as a heartbeat.
>  * the update triggers a check to see if we can assign a container on the 
> node.
>  * walk over the queue hierarchy to find a queue to assign a container to: 
> top down.
>  * for each parent queue we sort the child queues in {{assignContainer}} to 
> decide which queue to descent into.
>  * we lock the parent queue when sort to prevent changes, but we do not lock 
> the child queues that we are sorting.
> If during this sorting a different node update changes a child queue then we 
> allow that. This means that the objects that we are trying to sort now might 
> be out of order. That causes the issue with the comparator. The comparator 
> itself is not broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548143#comment-16548143
 ] 

genericqa commented on YARN-8501:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
9s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8501 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931991/YARN-8501.005.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21282/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Reduce complexity of RMWebServices' getApps method
> --
>
> Key: YARN-8501
> URL: https://issues.apache.org/jira/browse/YARN-8501
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8501.001.patch, YARN-8501.002.patch, 
> YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8547) rm may crash if nm register with too many applications

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548134#comment-16548134
 ] 

genericqa commented on YARN-8547:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
7s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8547 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932057/YARN-8547.01.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21281/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> rm may crash if nm register with too many applications
> --
>
> Key: YARN-8547
> URL: https://issues.apache.org/jira/browse/YARN-8547
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
>Priority: Major
> Attachments: YARN-8547.01.patch
>
>
> 1, our cluster had n k+ nodes, and disabled log aggregation, one single nm 
> may keeps 1w+ apps 
> 2, when rm failover, single nm register with 1w+ apps, causing active rm 
> always gc and lost connection with zk.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-18 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548127#comment-16548127
 ] 

genericqa commented on YARN-8517:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m  
6s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8517 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12932088/YARN-8517.004.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21280/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-8517.001.patch, YARN-8517.002.patch, 
> YARN-8517.003.patch, YARN-8517.004.patch
>
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-07-18 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548093#comment-16548093
 ] 

Manikandan R commented on YARN-4606:


Thanks [~eepayne] for your comments.

 
{quote} I have a general concern that these tests are not testing the fix to 
the starvation problem outlined in the description of this JIRA. I'm trying to 
determine if there is a clean way to unit test that use case.
{quote}
Ok. 
 Since Active app starvation happens because of less resource allocation based 
on incorrect active users count, in addition to checking active users count, 
Can we check allocated resources for each user? Is it good enough? Earlier, 
resource allocation (amount of memory, vcores) should be lesser (half of the 
allocation with this patch based on the example given in jira description). 
Whereas, with this patch, it should be higher. 
 (or) 
 With this patch, app should complete faster than before because of proper 
resource allocation as expected. Can we simulate this in test cases and check 
the app completion time?

Will take care of #2, #3 & #4.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.005.patch, 
> YARN-4606.006.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8436) FSParentQueue: Comparison method violates its general contract

2018-07-18 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-8436:

Attachment: YARN-8436.003.patch

> FSParentQueue: Comparison method violates its general contract
> --
>
> Key: YARN-8436
> URL: https://issues.apache.org/jira/browse/YARN-8436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Attachments: YARN-8436.001.patch, YARN-8436.002.patch, 
> YARN-8436.003.patch
>
>
> The ResourceManager can fail while sorting queues if an update comes in:
> {code:java}
> FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>   at java.util.TimSort.mergeLo(TimSort.java:777)
>   at java.util.TimSort.mergeAt(TimSort.java:514)
> ...
>   at java.util.Collections.sort(Collections.java:175)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:223){code}
> The reason it breaks is a change in the sorted object itself. 
> This is why it fails:
>  * an update from a node comes in as a heartbeat.
>  * the update triggers a check to see if we can assign a container on the 
> node.
>  * walk over the queue hierarchy to find a queue to assign a container to: 
> top down.
>  * for each parent queue we sort the child queues in {{assignContainer}} to 
> decide which queue to descent into.
>  * we lock the parent queue when sort to prevent changes, but we do not lock 
> the child queues that we are sorting.
> If during this sorting a different node update changes a child queue then we 
> allow that. This means that the objects that we are trying to sort now might 
> be out of order. That causes the issue with the comparator. The comparator 
> itself is not broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8436) FSParentQueue: Comparison method violates its general contract

2018-07-18 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548057#comment-16548057
 ] 

Wilfred Spiegelenburg commented on YARN-8436:
-

1) fixed
2) The delay is not to make sure that the comparator is called at least once 
but to make sure that the sorting has started. The failure will only occur if 
and when the sorting is progressed far enough that there is merging and or 
inserting elements in a sorted run. The sleep is thus not for a synchronisation 
but really a delay for the modifications. The countdown latch would synchronise 
the start but that is not what I needed.

Uploading a new patch with the fixed comment

> FSParentQueue: Comparison method violates its general contract
> --
>
> Key: YARN-8436
> URL: https://issues.apache.org/jira/browse/YARN-8436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Attachments: YARN-8436.001.patch, YARN-8436.002.patch
>
>
> The ResourceManager can fail while sorting queues if an update comes in:
> {code:java}
> FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>   at java.util.TimSort.mergeLo(TimSort.java:777)
>   at java.util.TimSort.mergeAt(TimSort.java:514)
> ...
>   at java.util.Collections.sort(Collections.java:175)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:223){code}
> The reason it breaks is a change in the sorted object itself. 
> This is why it fails:
>  * an update from a node comes in as a heartbeat.
>  * the update triggers a check to see if we can assign a container on the 
> node.
>  * walk over the queue hierarchy to find a queue to assign a container to: 
> top down.
>  * for each parent queue we sort the child queues in {{assignContainer}} to 
> decide which queue to descent into.
>  * we lock the parent queue when sort to prevent changes, but we do not lock 
> the child queues that we are sorting.
> If during this sorting a different node update changes a child queue then we 
> allow that. This means that the objects that we are trying to sort now might 
> be out of order. That causes the issue with the comparator. The comparator 
> itself is not broken.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-18 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548043#comment-16548043
 ] 

Eric Yang commented on YARN-8501:
-

[~snemeth] Build infrastructure is broken by HADOOP-15610.  Tests will be 
triggered when that issues is addressed.  Thank you for your patience. 

> Reduce complexity of RMWebServices' getApps method
> --
>
> Key: YARN-8501
> URL: https://issues.apache.org/jira/browse/YARN-8501
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8501.001.patch, YARN-8501.002.patch, 
> YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2

2018-07-18 Thread Prabha Manepalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabha Manepalli updated YARN-8549:
---
Issue Type: Sub-task  (was: Task)
Parent: YARN-5355

> No operation timeline writer and reader plugin classes for ATSv2
> 
>
> Key: YARN-8549
> URL: https://issues.apache.org/jira/browse/YARN-8549
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineclient, timelineserver
>Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2
>Reporter: Prabha Manepalli
>Priority: Minor
> Fix For: YARN-2928, YARN-5355, YARN-5355_branch2
>
> Attachments: TimeLineReaderAndWriterStubs.patch
>
>
> Stub implementation for TimeLineReader and TimeLineWriter classes. 
> These are useful for functional testing of writer and reader path for ATSv2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2

2018-07-18 Thread Prabha Manepalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabha Manepalli updated YARN-8549:
---
Attachment: (was: TimeLineReaderAndWriterStubs.patch)

> No operation timeline writer and reader plugin classes for ATSv2
> 
>
> Key: YARN-8549
> URL: https://issues.apache.org/jira/browse/YARN-8549
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: ATSv2, timelineclient, timelineserver
>Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2
>Reporter: Prabha Manepalli
>Priority: Minor
> Fix For: YARN-2928, YARN-5355, YARN-5355_branch2
>
> Attachments: TimeLineReaderAndWriterStubs.patch
>
>
> Stub implementation for TimeLineReader and TimeLineWriter classes. 
> These are useful for functional testing of writer and reader path for ATSv2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2

2018-07-18 Thread Prabha Manepalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabha Manepalli updated YARN-8549:
---
Description: 
Stub implementation for TimeLineReader and TimeLineWriter classes. 

These are useful for functional testing of writer and reader path for ATSv2

> No operation timeline writer and reader plugin classes for ATSv2
> 
>
> Key: YARN-8549
> URL: https://issues.apache.org/jira/browse/YARN-8549
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: ATSv2, timelineclient, timelineserver
>Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2
>Reporter: Prabha Manepalli
>Priority: Minor
> Fix For: YARN-2928, YARN-5355, YARN-5355_branch2
>
> Attachments: TimeLineReaderAndWriterStubs.patch, 
> TimeLineReaderAndWriterStubs.patch
>
>
> Stub implementation for TimeLineReader and TimeLineWriter classes. 
> These are useful for functional testing of writer and reader path for ATSv2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2

2018-07-18 Thread Prabha Manepalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabha Manepalli updated YARN-8549:
---
Attachment: TimeLineReaderAndWriterStubs.patch

> No operation timeline writer and reader plugin classes for ATSv2
> 
>
> Key: YARN-8549
> URL: https://issues.apache.org/jira/browse/YARN-8549
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: ATSv2, timelineclient, timelineserver
>Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2
>Reporter: Prabha Manepalli
>Priority: Minor
> Fix For: YARN-2928, YARN-5355, YARN-5355_branch2
>
> Attachments: TimeLineReaderAndWriterStubs.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2

2018-07-18 Thread Prabha Manepalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabha Manepalli updated YARN-8549:
---
Attachment: 0001-Adding-stub-implementation-classes-for-TimeLineReade.patch

> No operation timeline writer and reader plugin classes for ATSv2
> 
>
> Key: YARN-8549
> URL: https://issues.apache.org/jira/browse/YARN-8549
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: ATSv2, timelineclient, timelineserver
>Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2
>Reporter: Prabha Manepalli
>Priority: Minor
> Fix For: YARN-2928, YARN-5355, YARN-5355_branch2
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2

2018-07-18 Thread Prabha Manepalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabha Manepalli updated YARN-8549:
---
Attachment: (was: 
0001-Adding-stub-implementation-classes-for-TimeLineReade.patch)

> No operation timeline writer and reader plugin classes for ATSv2
> 
>
> Key: YARN-8549
> URL: https://issues.apache.org/jira/browse/YARN-8549
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: ATSv2, timelineclient, timelineserver
>Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2
>Reporter: Prabha Manepalli
>Priority: Minor
> Fix For: YARN-2928, YARN-5355, YARN-5355_branch2
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8549) No operation timeline writer and reader plugin classes for ATSv2

2018-07-18 Thread Prabha Manepalli (JIRA)
Prabha Manepalli created YARN-8549:
--

 Summary: No operation timeline writer and reader plugin classes 
for ATSv2
 Key: YARN-8549
 URL: https://issues.apache.org/jira/browse/YARN-8549
 Project: Hadoop YARN
  Issue Type: Task
  Components: ATSv2, timelineclient, timelineserver
Affects Versions: YARN-2928, YARN-5355, YARN-5335_branch2
Reporter: Prabha Manepalli
 Fix For: YARN-2928, YARN-5355, YARN-5355_branch2






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-18 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547963#comment-16547963
 ] 

Antal Bálint Steinbach commented on YARN-8517:
--

Thanks [~snemeth] . Fixed.

> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-8517.001.patch, YARN-8517.002.patch, 
> YARN-8517.003.patch, YARN-8517.004.patch
>
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-8517:
-
Attachment: YARN-8517.004.patch

> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-8517.001.patch, YARN-8517.002.patch, 
> YARN-8517.003.patch, YARN-8517.004.patch
>
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-18 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547925#comment-16547925
 ] 

Szilard Nemeth commented on YARN-8517:
--

Hi [~bsteinbach]!
Thanks for the updated patch.
I think one bullet point is still missing, I don't see the changes for the 5.
Apart from that, the patch looks good.

> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-8517.001.patch, YARN-8517.002.patch, 
> YARN-8517.003.patch
>
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-18 Thread Bilwa S T (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547784#comment-16547784
 ] 

Bilwa S T commented on YARN-8548:
-

Thanks [~bibinchundatt] for reporting the issue. I have attched a patch.

> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-8548-001.patch
>
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-18 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-8548:

Attachment: YARN-8548-001.patch

> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-8548-001.patch
>
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-18 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned YARN-8548:
---

Assignee: Bilwa S T

> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-18 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-8548:
--

 Summary: AllocationRespose proto setNMToken initBuilder not done
 Key: YARN-8548
 URL: https://issues.apache.org/jira/browse/YARN-8548
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt


Distributed Scheduling allocate failing

{code}
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
at 
org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
at 
org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
at 
org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
at 
org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
at org.apache.hadoop.ipc.Client.call(Client.java:1445)
at org.apache.hadoop.ipc.Client.call(Client.java:1355)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy85.allocate(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8548) AllocationRespose proto setNMToken initBuilder not done

2018-07-18 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8548:
---
Target Version/s: 3.1.1

> AllocationRespose proto setNMToken initBuilder not done
> ---
>
> Key: YARN-8548
> URL: https://issues.apache.org/jira/browse/YARN-8548
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Major
>
> Distributed Scheduling allocate failing
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.setNMTokens(AllocateResponsePBImpl.java:354)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.updateAllocateResponse(DistributedScheduler.java:181)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocateForDistributedScheduling(DistributedScheduler.java:257)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.allocate(DistributedScheduler.java:154)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.allocate(AMRMProxyService.java:321)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1445)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1355)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy85.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-18 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: hadoop-2.9.0.gpu-port.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-18 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: (was: hadoop-2.9.0.gpu-port.patch)

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8546) A reserved container might be released multiple times under async scheduling

2018-07-18 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8546:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-5139

> A reserved container might be released multiple times under async scheduling
> 
>
> Key: YARN-8546
> URL: https://issues.apache.org/jira/browse/YARN-8546
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: global-scheduling
>
> I was able to reproduce this issue by starting a job, and this job keeps 
> requesting containers until it uses up cluster available resource. My cluster 
> has 70200 vcores, and each task it applies for 100 vcores, I was expecting 
> total 702 containers can be allocated but eventually there was only 701. The 
> last container could not get allocated because queue used resource is updated 
> to be more than 100%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8547) rm may crash if nm register with too many applications

2018-07-18 Thread sandflee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-8547:
---
Attachment: YARN-8547.01.patch

> rm may crash if nm register with too many applications
> --
>
> Key: YARN-8547
> URL: https://issues.apache.org/jira/browse/YARN-8547
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
>Priority: Major
> Attachments: YARN-8547.01.patch
>
>
> 1, our cluster had n k+ nodes, and disabled log aggregation, one single nm 
> may keeps 1w+ apps 
> 2, when rm failover, single nm register with 1w+ apps, causing active rm 
> always gc and lost connection with zk.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8547) rm may crash if nm register with too many applications

2018-07-18 Thread sandflee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-8547:
---
Description: 
1, our cluster had n k+ nodes, and disabled log aggregation, one single nm may 
keeps 1w+ apps 

2, when rm failover, single nm register with 1w+ apps, causing active rm always 
gc and lost connection with zk.  

  was:
1, our cluster had n k+ nodes, and we disable log aggregation, single nm may 
keeps 1w+ apps 

2, when rm failover, nm register with 1w+ apps, causing active rm always gc and 
lost connection with zk.  


> rm may crash if nm register with too many applications
> --
>
> Key: YARN-8547
> URL: https://issues.apache.org/jira/browse/YARN-8547
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
>Priority: Major
>
> 1, our cluster had n k+ nodes, and disabled log aggregation, one single nm 
> may keeps 1w+ apps 
> 2, when rm failover, single nm register with 1w+ apps, causing active rm 
> always gc and lost connection with zk.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-18 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547591#comment-16547591
 ] 

Antal Bálint Steinbach commented on YARN-8517:
--

Hi [~snemeth] ,

Thanks for the review. I fixed the mentioned issues. I called the APIs from a 
browser and used the result for the examples. There was no such a field like 
"diagnosticsInfo".

> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-8517.001.patch, YARN-8517.002.patch, 
> YARN-8517.003.patch
>
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-8517:
-
Attachment: YARN-8517.003.patch

> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-8517.001.patch, YARN-8517.002.patch, 
> YARN-8517.003.patch
>
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-18 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-8517:
-
Attachment: YARN-8517.002.patch

> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
> Attachments: YARN-8517.001.patch, YARN-8517.002.patch
>
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8547) rm may crash if nm register with too many applications

2018-07-18 Thread sandflee (JIRA)
sandflee created YARN-8547:
--

 Summary: rm may crash if nm register with too many applications
 Key: YARN-8547
 URL: https://issues.apache.org/jira/browse/YARN-8547
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: sandflee
Assignee: sandflee


1, our cluster had n k+ nodes, and we disable log aggregation, single nm may 
keeps 1w+ apps 

2, when rm failover, nm register with 1w+ apps, causing active rm always gc and 
lost connection with zk.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7590) Improve container-executor validation check

2018-07-18 Thread Aljoscha Krettek (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547568#comment-16547568
 ] 

Aljoscha Krettek commented on YARN-7590:


Thanks a lot [~ebadger]! This was indeed the problem. I thought it might have 
been a problem with the setuid/permissions setup that's why I didn't check. 
FYI, this is not a production cluster but a little testing project for setting 
up a distributed kerberized cluster on Docker: 
https://github.com/aljoscha/docker-hadoop-secure-cluster.

> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 2.6.6, 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6
>
> Attachments: YARN-7590.001.patch, YARN-7590.002.patch, 
> YARN-7590.003.patch, YARN-7590.004.patch, YARN-7590.005.patch, 
> YARN-7590.006.patch, YARN-7590.007.patch, YARN-7590.008.patch, 
> YARN-7590.009.patch, YARN-7590.010.patch, YARN-7590.branch-2.000.patch, 
> YARN-7590.branch-2.6.000.patch, YARN-7590.branch-2.7.000.patch, 
> YARN-7590.branch-2.8.000.patch, YARN-7590.branch-2.9.000.patch
>
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is owned by the same user as the caller to 
> container-executor.
> # Make sure the log directory prefix is owned by the same user as the caller.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-18 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: (was: hadoop-2.9.0.gpu-port.patch)

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-18 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: hadoop-2.9.0.gpu-port.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8482) [Router] Add cache service for fast answers to getApps

2018-07-18 Thread Dillon Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547484#comment-16547484
 ] 

Dillon Zhang commented on YARN-8482:


[~giovanni.fumarola] ok ~

> [Router] Add cache service for fast answers to getApps
> --
>
> Key: YARN-8482
> URL: https://issues.apache.org/jira/browse/YARN-8482
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7300) DiskValidator is not used in LocalDirAllocator

2018-07-18 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547482#comment-16547482
 ] 

Szilard Nemeth commented on YARN-7300:
--

Hi [~haibochen]!
Looks like we had some general build infrastructure issues.
Could you please retrigger the build?
Thanks!

> DiskValidator is not used in LocalDirAllocator
> --
>
> Key: YARN-7300
> URL: https://issues.apache.org/jira/browse/YARN-7300
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Haibo Chen
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-7300.001.patch, YARN-7300.002.patch
>
>
> HADOOP-13254 introduced a pluggable disk validator to replace 
> DiskChecker.checkDir(). However, LocalDirAllocator still references the old 
> DiskChecker.checkDir(). It'd be nice to
> use the plugin uniformly so that user configurations take effect in all 
> places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-18 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547480#comment-16547480
 ] 

Szilard Nemeth commented on YARN-6966:
--

Hi [~rkanter]!
builds.apache.org is up now.
Could you please retrigger the build?
Thanks!

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-18 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547477#comment-16547477
 ] 

Szilard Nemeth edited comment on YARN-8501 at 7/18/18 7:23 AM:
---

Hi [~eyang]!
Looks like we had/have some build infrastructure issues.
Could you please retrigger the build? 
I'm not sure what kind of build issues we had or we still have that.


was (Author: snemeth):
Hi [~eyang]!
Could you please retrigger the build? 
I'm not sure what kind of build issues we had or we still have that.

> Reduce complexity of RMWebServices' getApps method
> --
>
> Key: YARN-8501
> URL: https://issues.apache.org/jira/browse/YARN-8501
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8501.001.patch, YARN-8501.002.patch, 
> YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-18 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547477#comment-16547477
 ] 

Szilard Nemeth commented on YARN-8501:
--

Hi [~eyang]!
Could you please retrigger the build? 
I'm not sure what kind of build issues we had or we still have that.

> Reduce complexity of RMWebServices' getApps method
> --
>
> Key: YARN-8501
> URL: https://issues.apache.org/jira/browse/YARN-8501
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8501.001.patch, YARN-8501.002.patch, 
> YARN-8501.003.patch, YARN-8501.004.patch, YARN-8501.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-18 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: hadoop-2.9.0.gpu-port.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-18 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: (was: hadoop-2.9.0.gpu-port.patch)

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8546) A reserved container might be released multiple times under async scheduling

2018-07-18 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned YARN-8546:
-

Assignee: Tao Yang

> A reserved container might be released multiple times under async scheduling
> 
>
> Key: YARN-8546
> URL: https://issues.apache.org/jira/browse/YARN-8546
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: global-scheduling
>
> I was able to reproduce this issue by starting a job, and this job keeps 
> requesting containers until it uses up cluster available resource. My cluster 
> has 70200 vcores, and each task it applies for 100 vcores, I was expecting 
> total 702 containers can be allocated but eventually there was only 701. The 
> last container could not get allocated because queue used resource is updated 
> to be more than 100%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8546) A reserved container might be released multiple times under async scheduling

2018-07-18 Thread Weiwei Yang (JIRA)
Weiwei Yang created YARN-8546:
-

 Summary: A reserved container might be released multiple times 
under async scheduling
 Key: YARN-8546
 URL: https://issues.apache.org/jira/browse/YARN-8546
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 3.1.0
Reporter: Weiwei Yang


I was able to reproduce this issue by starting a job, and this job keeps 
requesting containers until it uses up cluster available resource. My cluster 
has 70200 vcores, and each task it applies for 100 vcores, I was expecting 
total 702 containers can be allocated but eventually there was only 701. The 
last container could not get allocated because queue used resource is updated 
to be more than 100%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8544) [DS] AM registration fails when hadoop authorization is enabled

2018-07-18 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546774#comment-16546774
 ] 

Bibin A Chundatt edited comment on YARN-8544 at 7/18/18 6:03 AM:
-

[~subru] / [~cheersyang]

Could you please  help to review.



was (Author: bibinchundatt):
[~subru]

Could you please  help to review.


> [DS] AM registration fails when hadoop authorization is enabled
> ---
>
> Key: YARN-8544
> URL: https://issues.apache.org/jira/browse/YARN-8544
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8544.001.patch
>
>
> Application master fails to register when hadoop authorization is enabled.
> DistributedSchedulingAMProtocol connection authorization fails are RM side  
> Issue credits: [~BilwaST]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org