[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498882#comment-16498882
 ] 

Sunil Govindan commented on YARN-8220:
--

Hi [~eyang]

One quick doubt

bq.ENTRYPOINT, and CMD in Dockerfile

This means that the ENTRYPOINT and other CMDs are to be specified in 
Dockerfile. This means we need different Dockerfiles to run different TF 
workload which may inconvenient, correct? We could have changed jobs in 
Yarnfile itself. Pls correct me if I am wrong.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps

2018-06-01 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498857#comment-16498857
 ] 

Rohith Sharma K S commented on YARN-8319:
-

[~sunilg] if it is applicable for branch-2, please feel free to backport to 
branch-2 as well. 

> More YARN pages need to honor yarn.resourcemanager.display.per-user-apps
> 
>
> Key: YARN-8319
> URL: https://issues.apache.org/jira/browse/YARN-8319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8319.001.patch, YARN-8319.002.patch, 
> YARN-8319.003.patch, YARN-8319.addendum.001.patch
>
>
> When this config is on
>  - Per queue page on UI2 should filter app list by user
>  -- TODO: Verify the same with UI1 Per-queue page
>  - ATSv2 with UI2 should filter list of all users' flows and flow activities
>  - Per Node pages
>  -- Listing of apps and containers on a per-node basis should filter apps and 
> containers by user.
> To this end, because this is no longer just for resourcemanager, we should 
> also deprecate {{yarn.resourcemanager.display.per-user-apps}} in favor of 
> {{yarn.webapp.filter-app-list-by-user}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8175) Add support for Node Labels in SLS.

2018-06-01 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498814#comment-16498814
 ] 

Abhishek Modi commented on YARN-8175:
-

Thanks [~botong] for quick review. I have attached YARN-8175.006.patch fixing 
review comments.  [~elgoiri] could you please review and commit it if it looks 
good.

> Add support for Node Labels in SLS.
> ---
>
> Key: YARN-8175
> URL: https://issues.apache.org/jira/browse/YARN-8175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8175.001.patch, YARN-8175.002.patch, 
> YARN-8175.003.patch, YARN-8175.004.patch, YARN-8175.005.patch, 
> YARN-8175.006.patch
>
>
> Currently, SLS doesn't support node labels. With this jira, we are planning 
> to add support for node labels in SLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8240) Add queue-level control to allow all applications in a queue to opt-out

2018-06-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498755#comment-16498755
 ] 

genericqa commented on YARN-8240:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-1011 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  4m  
0s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
20s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
3s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
22s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} YARN-1011 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} YARN-1011 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
55s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  6m 55s{color} 
| {color:red} hadoop-yarn-project_hadoop-yarn generated 4 new + 124 unchanged - 
0 fixed = 128 total (was 124) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 241 unchanged - 0 fixed = 243 total (was 241) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 39s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
26s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}150m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce 

[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498751#comment-16498751
 ] 

genericqa commented on YARN-8342:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 1s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
16s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 49s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}116m 50s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 34m 23s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 27s{color} 
| {color:red} hadoop-yarn-services-core in the patch failed. {color} |
| {color:green}+1{color} | 

[jira] [Commented] (YARN-8375) TestCGroupElasticMemoryController fails surefire build

2018-06-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498726#comment-16498726
 ] 

Hudson commented on YARN-8375:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14340 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14340/])
YARN-8375. TestCGroupElasticMemoryController fails surefire build. (haibochen: 
rev 4880d890ee4716dafc5ff464c92a3c6d83b1db9b)
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/PlatformAssumptions.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupElasticMemoryController.java


> TestCGroupElasticMemoryController fails surefire build
> --
>
> Key: YARN-8375
> URL: https://issues.apache.org/jira/browse/YARN-8375
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8375.000.patch, YARN-8375.001.patch, 
> YARN-8375.002.patch
>
>
> hadoop-yarn-server-nodemanager precommit builds have been failing unit tests 
> recently because TestCGroupElasticMemoryController is either exiting or 
> timing out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8349) Remove YARN registry entries when a service is killed by the RM

2018-06-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498718#comment-16498718
 ] 

genericqa commented on YARN-8349:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 31m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
7s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 31m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 31m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
29s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
27s{color} | {color:green} hadoop-yarn-registry in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 23s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 27m 
37s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m  
0s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} 

[jira] [Created] (YARN-8388) TestCGroupElasticMemoryController.testNormalExit() hangs on Linux

2018-06-01 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8388:


 Summary: TestCGroupElasticMemoryController.testNormalExit() hangs 
on Linux
 Key: YARN-8388
 URL: https://issues.apache.org/jira/browse/YARN-8388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: Haibo Chen
Assignee: Miklos Szegedi


YARN-8375 disables the unit test on Linux. But given that we will be running 
the CGroupElasticMemoryController on Linux, we need to figure out why it is 
hanging and ideally fix it.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8375) TestCGroupElasticMemoryController fails surefire build

2018-06-01 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498696#comment-16498696
 ] 

Haibo Chen commented on YARN-8375:
--

[~miklos.szeg...@cloudera.com] and I debugged this together. We were able to 
reproduce this on an Ubuntu machine, but as soon as we attach strace, the test 
succeeds.

Will commit the patch for now to disable this on Linux and continue the 
investigation.

> TestCGroupElasticMemoryController fails surefire build
> --
>
> Key: YARN-8375
> URL: https://issues.apache.org/jira/browse/YARN-8375
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: YARN-8375.000.patch, YARN-8375.001.patch, 
> YARN-8375.002.patch
>
>
> hadoop-yarn-server-nodemanager precommit builds have been failing unit tests 
> recently because TestCGroupElasticMemoryController is either exiting or 
> timing out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8349) Remove YARN registry entries when a service is killed by the RM

2018-06-01 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8349:
-
Priority: Critical  (was: Major)

> Remove YARN registry entries when a service is killed by the RM
> ---
>
> Key: YARN-8349
> URL: https://issues.apache.org/jira/browse/YARN-8349
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8349.1.patch, YARN-8349.2.patch, YARN-8349.3.patch, 
> YARN-8349.4.patch
>
>
> As the title states, when a service is killed by the RM (for exceeding its 
> lifetime for example), the YARN registry entries should be cleaned up.
> Without cleanup, DNS can contain multiple hostnames for a single IP address 
> in the case where IPs are reused. This impacts reverse lookups, which breaks 
> services, such as kerberos, that depend on those lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8372) Distributed shell app master should not release containers when shutdown if keep-container is true

2018-06-01 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8372:
-
Priority: Critical  (was: Major)

> Distributed shell app master should not release containers when shutdown if 
> keep-container is true
> --
>
> Key: YARN-8372
> URL: https://issues.apache.org/jira/browse/YARN-8372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Reporter: Charan Hebri
>Assignee: Suma Shivaprasad
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8372.1.patch, YARN-8372.2.patch, YARN-8372.3.patch
>
>
> {noformat}
> try {
>   response = client.allocate(progress);
> } catch (ApplicationAttemptNotFoundException e) {
> handler.onShutdownRequest();
> LOG.info("Shutdown requested. Stopping callback.");
> return;{noformat}
> is a code snippet from AMRMClientAsyncImpl. The corresponding 
> onShutdownRequest call for the Distributed Shell App master,
> {noformat}
> @Override
> public void onShutdownRequest() {
>   done = true;
> }{noformat}
> Due to the above change, the current behavior is that whenever an application 
> attempt fails due to a NM restart (NM where the DS AM is running), an 
> ApplicationAttemptNotFoundException is thrown and all containers for that 
> attempt including the ones that are running on other NMs are killed by the AM 
> and marked as COMPLETE. The subsequent attempt spawns new containers just 
> like a new attempt. This behavior is different to a Map Reduce application 
> where the containers are not killed.
> cc [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498643#comment-16498643
 ] 

genericqa commented on YARN-8342:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  6m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
2s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  8m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
17s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}116m 38s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 34m 17s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
56s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
27s{color} | {color:green} 

[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer causes RM crash during failover

2018-06-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498637#comment-16498637
 ] 

Hudson commented on YARN-7962:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14338 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14338/])
YARN-7962. Race Condition When Stopping DelegationTokenRenewer causes RM 
(wangda: rev 931f78718f3a09775bfa1f9a952c069c416d0914)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


> Race Condition When Stopping DelegationTokenRenewer causes RM crash during 
> failover
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Critical
> Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, 
> YARN-7962.4.patch, YARN-7962.6.patch, YARN-7962.7.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8384) stdout.txt, stderr.txt logs of a launched docker container is coming with primary group of submit user instead of hadoop

2018-06-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498636#comment-16498636
 ] 

Hudson commented on YARN-8384:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14338 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14338/])
YARN-8384. stdout.txt, stderr.txt logs of a launched docker container is 
(wangda: rev 3a6bd775500343632bad5f9c1a2bfacd408f7760)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


> stdout.txt, stderr.txt logs of a launched docker container is coming with 
> primary group of submit user instead of hadoop
> 
>
> Key: YARN-8384
> URL: https://issues.apache.org/jira/browse/YARN-8384
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: docker
> Attachments: YARN-8384.001.patch
>
>
> When {{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} 
> is set to true, and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to nobody.
> This will cause the docker to run as nobody:nobody in yarn mode.
> The log files will be initialized as nobody:nobody:
> {noformat}
> rw-rr- 1 nobody hadoop 354 May 31 17:33 container-localizer-syslog
> rw-rr- 1 nobody hadoop 1042 May 31 17:35 directory.info
> rw-r 1 nobody hadoop 4944 May 31 17:35 launch_container.sh
> rw-rr- 1 nobody hadoop 440 May 31 17:35 prelaunch.err
> rw-rr- 1 nobody hadoop 100 May 31 17:35 prelaunch.out
> rw-r 1 nobody nobody 18733 May 31 17:37 stderr.txt
> rw-r 1 nobody nobody 400 May 31 17:35 stdout.txt
> {noformat}
> This causes YARN NM cannot read stderr.txt and stdout.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8372) Distributed shell app master should not release containers when shutdown if keep-container is true

2018-06-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498638#comment-16498638
 ] 

Hudson commented on YARN-8372:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14338 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14338/])
YARN-8372. Distributed shell app master should not release containers (wangda: 
rev 8956e5b8db3059e0872e49f59adc6affc76e2274)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java


> Distributed shell app master should not release containers when shutdown if 
> keep-container is true
> --
>
> Key: YARN-8372
> URL: https://issues.apache.org/jira/browse/YARN-8372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Reporter: Charan Hebri
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8372.1.patch, YARN-8372.2.patch, YARN-8372.3.patch
>
>
> {noformat}
> try {
>   response = client.allocate(progress);
> } catch (ApplicationAttemptNotFoundException e) {
> handler.onShutdownRequest();
> LOG.info("Shutdown requested. Stopping callback.");
> return;{noformat}
> is a code snippet from AMRMClientAsyncImpl. The corresponding 
> onShutdownRequest call for the Distributed Shell App master,
> {noformat}
> @Override
> public void onShutdownRequest() {
>   done = true;
> }{noformat}
> Due to the above change, the current behavior is that whenever an application 
> attempt fails due to a NM restart (NM where the DS AM is running), an 
> ApplicationAttemptNotFoundException is thrown and all containers for that 
> attempt including the ones that are running on other NMs are killed by the AM 
> and marked as COMPLETE. The subsequent attempt spawns new containers just 
> like a new attempt. This behavior is different to a Map Reduce application 
> where the containers are not killed.
> cc [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8349) Remove YARN registry entries when a service is killed by the RM

2018-06-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498639#comment-16498639
 ] 

Hudson commented on YARN-8349:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14338 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14338/])
YARN-8349. Remove YARN registry entries when a service is killed by the 
(wangda: rev ff583d3fa3325029bc691ec22d817aee37e5e85d)
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/src/test/resources/yarn-site.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java
* (delete) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AppAdminClient.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/client/binding/RegistryUtils.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/src/test/java/org/apache/hadoop/yarn/service/TestCleanupAfterKill.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/client/ServiceClient.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestYarnNativeServices.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/AppAdminClient.java
* (edit) hadoop-project/pom.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/src/main/java/org/apache/hadoop/yarn/service/client/ApiServiceClient.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> Remove YARN registry entries when a service is killed by the RM
> ---
>
> Key: YARN-8349
> URL: https://issues.apache.org/jira/browse/YARN-8349
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8349.1.patch, YARN-8349.2.patch, YARN-8349.3.patch, 
> YARN-8349.4.patch
>
>
> As the title states, when a service is killed by the RM (for exceeding its 
> lifetime for example), the YARN registry entries should be cleaned up.
> Without cleanup, DNS can contain multiple hostnames for a single IP address 
> in the case where IPs are reused. This impacts reverse lookups, which breaks 
> services, such as kerberos, that depend on those lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8240) Add queue-level control to allow all applications in a queue to opt-out

2018-06-01 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8240:
-
Attachment: YARN-8240-YARN-1011.00.patch

> Add queue-level control to allow all applications in a queue to opt-out
> ---
>
> Key: YARN-8240
> URL: https://issues.apache.org/jira/browse/YARN-8240
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8240-YARN-1011.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498589#comment-16498589
 ] 

genericqa commented on YARN-8342:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
18s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}116m 15s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 34m 40s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
15s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
18s{color} | {color:green} 

[jira] [Commented] (YARN-8349) Remove YARN registry entries when a service is killed by the RM

2018-06-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498587#comment-16498587
 ] 

genericqa commented on YARN-8349:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 30m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
24s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
4s{color} | {color:green} hadoop-yarn-registry in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 59s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 
37s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
52s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} 

[jira] [Updated] (YARN-8372) Distributed shell app master should not release containers when shutdown if keep-container is true

2018-06-01 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8372:
-
Summary: Distributed shell app master should not release containers when 
shutdown if keep-container is true  (was: ApplicationAttemptNotFoundException 
should be handled correctly by Distributed Shell App Master)

> Distributed shell app master should not release containers when shutdown if 
> keep-container is true
> --
>
> Key: YARN-8372
> URL: https://issues.apache.org/jira/browse/YARN-8372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Reporter: Charan Hebri
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8372.1.patch, YARN-8372.2.patch, YARN-8372.3.patch
>
>
> {noformat}
> try {
>   response = client.allocate(progress);
> } catch (ApplicationAttemptNotFoundException e) {
> handler.onShutdownRequest();
> LOG.info("Shutdown requested. Stopping callback.");
> return;{noformat}
> is a code snippet from AMRMClientAsyncImpl. The corresponding 
> onShutdownRequest call for the Distributed Shell App master,
> {noformat}
> @Override
> public void onShutdownRequest() {
>   done = true;
> }{noformat}
> Due to the above change, the current behavior is that whenever an application 
> attempt fails due to a NM restart (NM where the DS AM is running), an 
> ApplicationAttemptNotFoundException is thrown and all containers for that 
> attempt including the ones that are running on other NMs are killed by the AM 
> and marked as COMPLETE. The subsequent attempt spawns new containers just 
> like a new attempt. This behavior is different to a Map Reduce application 
> where the containers are not killed.
> cc [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer causes RM crash during failover

2018-06-01 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7962:
-
Summary: Race Condition When Stopping DelegationTokenRenewer causes RM 
crash during failover  (was: Race Condition When Stopping 
DelegationTokenRenewer)

> Race Condition When Stopping DelegationTokenRenewer causes RM crash during 
> failover
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Critical
> Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, 
> YARN-7962.4.patch, YARN-7962.6.patch, YARN-7962.7.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-06-01 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-7962:


Assignee: BELUGA BEHR

> Race Condition When Stopping DelegationTokenRenewer
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Critical
> Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, 
> YARN-7962.4.patch, YARN-7962.6.patch, YARN-7962.7.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8384) stdout.txt, stderr.txt logs of a launched docker container is coming with primary group of submit user instead of hadoop

2018-06-01 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8384:
-
Description: 
When {{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} 
is set to true, and 
{{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is set 
to nobody.

This will cause the docker to run as nobody:nobody in yarn mode.
The log files will be initialized as nobody:nobody:

{noformat}
rw-rr- 1 nobody hadoop 354 May 31 17:33 container-localizer-syslog
rw-rr- 1 nobody hadoop 1042 May 31 17:35 directory.info
rw-r 1 nobody hadoop 4944 May 31 17:35 launch_container.sh
rw-rr- 1 nobody hadoop 440 May 31 17:35 prelaunch.err
rw-rr- 1 nobody hadoop 100 May 31 17:35 prelaunch.out
rw-r 1 nobody nobody 18733 May 31 17:37 stderr.txt
rw-r 1 nobody nobody 400 May 31 17:35 stdout.txt
{noformat}

This causes YARN NM cannot read stderr.txt and stdout.txt


  was:
When {{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} 
is set to true, and 
{{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is set 
to nobody.

This will cause the docker to run as nobody:nobody in yarn mode.
The log files will be initialized as nobody:nobody:

{noformat}
rw-rr- 1 nobody hadoop 354 May 31 17:33 container-localizer-syslog
rw-rr- 1 nobody hadoop 1042 May 31 17:35 directory.info
rw-r 1 nobody hadoop 4944 May 31 17:35 launch_container.sh
rw-rr- 1 nobody hadoop 440 May 31 17:35 prelaunch.err
rw-rr- 1 nobody hadoop 100 May 31 17:35 prelaunch.out
rw-r 1 nobody nobody 18733 May 31 17:37 stderr.txt
rw-r 1 nobody nobody 400 May 31 17:35 stdout.txt
{noformat}




> stdout.txt, stderr.txt logs of a launched docker container is coming with 
> primary group of submit user instead of hadoop
> 
>
> Key: YARN-8384
> URL: https://issues.apache.org/jira/browse/YARN-8384
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: docker
> Attachments: YARN-8384.001.patch
>
>
> When {{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} 
> is set to true, and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to nobody.
> This will cause the docker to run as nobody:nobody in yarn mode.
> The log files will be initialized as nobody:nobody:
> {noformat}
> rw-rr- 1 nobody hadoop 354 May 31 17:33 container-localizer-syslog
> rw-rr- 1 nobody hadoop 1042 May 31 17:35 directory.info
> rw-r 1 nobody hadoop 4944 May 31 17:35 launch_container.sh
> rw-rr- 1 nobody hadoop 440 May 31 17:35 prelaunch.err
> rw-rr- 1 nobody hadoop 100 May 31 17:35 prelaunch.out
> rw-r 1 nobody nobody 18733 May 31 17:37 stderr.txt
> rw-r 1 nobody nobody 400 May 31 17:35 stdout.txt
> {noformat}
> This causes YARN NM cannot read stderr.txt and stdout.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8384) stdout.txt, stderr.txt logs of a launched docker container is coming with primary group of submit user instead of hadoop

2018-06-01 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8384:
-
Summary: stdout.txt, stderr.txt logs of a launched docker container is 
coming with primary group of submit user instead of hadoop  (was: stdout, 
stderr logs of a Native Service container is coming with group as nobody)

> stdout.txt, stderr.txt logs of a launched docker container is coming with 
> primary group of submit user instead of hadoop
> 
>
> Key: YARN-8384
> URL: https://issues.apache.org/jira/browse/YARN-8384
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: docker
> Attachments: YARN-8384.001.patch
>
>
> When {{yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users}} 
> is set to true, and 
> {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is 
> set to nobody.
> This will cause the docker to run as nobody:nobody in yarn mode.
> The log files will be initialized as nobody:nobody:
> {noformat}
> rw-rr- 1 nobody hadoop 354 May 31 17:33 container-localizer-syslog
> rw-rr- 1 nobody hadoop 1042 May 31 17:35 directory.info
> rw-r 1 nobody hadoop 4944 May 31 17:35 launch_container.sh
> rw-rr- 1 nobody hadoop 440 May 31 17:35 prelaunch.err
> rw-rr- 1 nobody hadoop 100 May 31 17:35 prelaunch.out
> rw-r 1 nobody nobody 18733 May 31 17:37 stderr.txt
> rw-r 1 nobody nobody 400 May 31 17:35 stdout.txt
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8372) ApplicationAttemptNotFoundException should be handled correctly by Distributed Shell App Master

2018-06-01 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498548#comment-16498548
 ] 

Wangda Tan commented on YARN-8372:
--

Patch LGTM, +1. Will commit today if no objections.

> ApplicationAttemptNotFoundException should be handled correctly by 
> Distributed Shell App Master
> ---
>
> Key: YARN-8372
> URL: https://issues.apache.org/jira/browse/YARN-8372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Reporter: Charan Hebri
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8372.1.patch, YARN-8372.2.patch, YARN-8372.3.patch
>
>
> {noformat}
> try {
>   response = client.allocate(progress);
> } catch (ApplicationAttemptNotFoundException e) {
> handler.onShutdownRequest();
> LOG.info("Shutdown requested. Stopping callback.");
> return;{noformat}
> is a code snippet from AMRMClientAsyncImpl. The corresponding 
> onShutdownRequest call for the Distributed Shell App master,
> {noformat}
> @Override
> public void onShutdownRequest() {
>   done = true;
> }{noformat}
> Due to the above change, the current behavior is that whenever an application 
> attempt fails due to a NM restart (NM where the DS AM is running), an 
> ApplicationAttemptNotFoundException is thrown and all containers for that 
> attempt including the ones that are running on other NMs are killed by the AM 
> and marked as COMPLETE. The subsequent attempt spawns new containers just 
> like a new attempt. This behavior is different to a Map Reduce application 
> where the containers are not killed.
> cc [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8349) Remove YARN registry entries when a service is killed by the RM

2018-06-01 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498543#comment-16498543
 ] 

Wangda Tan commented on YARN-8349:
--

Thank [~billie.rinaldi] for the patch, +1. Will commit by today if no 
objections.

> Remove YARN registry entries when a service is killed by the RM
> ---
>
> Key: YARN-8349
> URL: https://issues.apache.org/jira/browse/YARN-8349
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8349.1.patch, YARN-8349.2.patch, YARN-8349.3.patch, 
> YARN-8349.4.patch
>
>
> As the title states, when a service is killed by the RM (for exceeding its 
> lifetime for example), the YARN registry entries should be cleaned up.
> Without cleanup, DNS can contain multiple hostnames for a single IP address 
> in the case where IPs are reused. This impacts reverse lookups, which breaks 
> services, such as kerberos, that depend on those lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-06-01 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498517#comment-16498517
 ] 

Eric Payne commented on YARN-4781:
--

bq. Thank you. I ll commit this to branch-2 shortly.
Thanks [~sunilg]. I would like to keep branch-2, branch-2.9 and branch-2.8 in 
sync with the 3.x branches as much as possible.

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.0.3
>
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.branch-2.patch, 
> YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498497#comment-16498497
 ] 

Billie Rinaldi commented on YARN-8342:
--

I completed my testing and did not find any issues. My only additional 
suggestion is to add a sentence in the documentation about the launch command 
being comma separated instead of space separated when entrypoint is enabled.

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8342.001.patch, YARN-8342.002.patch, 
> YARN-8342.003.patch, YARN-8342.004.patch, YARN-8342.005.patch
>
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498503#comment-16498503
 ] 

Billie Rinaldi commented on YARN-8342:
--

+1 for patch 5. I will plan to commit this after the precommit build passes.

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8342.001.patch, YARN-8342.002.patch, 
> YARN-8342.003.patch, YARN-8342.004.patch, YARN-8342.005.patch
>
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8342:

Attachment: YARN-8342.005.patch

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8342.001.patch, YARN-8342.002.patch, 
> YARN-8342.003.patch, YARN-8342.004.patch, YARN-8342.005.patch
>
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498498#comment-16498498
 ] 

Eric Yang edited comment on YARN-8342 at 6/1/18 8:07 PM:
-

Patch 005 includes an new example to show how to specify launch_command with 
ENTRYPOINT support.


was (Author: eyang):
Patch 005 includes a new example to show how to specify launch_command with 
ENTRYPOINT support.

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8342.001.patch, YARN-8342.002.patch, 
> YARN-8342.003.patch, YARN-8342.004.patch, YARN-8342.005.patch
>
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498498#comment-16498498
 ] 

Eric Yang commented on YARN-8342:
-

Patch 005 includes a new example to show how to specify launch_command with 
ENTRYPOINT support.

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8342.001.patch, YARN-8342.002.patch, 
> YARN-8342.003.patch, YARN-8342.004.patch, YARN-8342.005.patch
>
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8175) Add support for Node Labels in SLS.

2018-06-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498460#comment-16498460
 ] 

genericqa commented on YARN-8175:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 49s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
15s{color} | {color:green} hadoop-sls in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8175 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12926139/YARN-8175.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b27d92f0bded 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6b21a59 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20926/testReport/ |
| Max. process+thread count | 468 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20926/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add support for Node Labels in SLS.
> ---
>
> Key: YARN-8175
> URL: https://issues.apache.org/jira/browse/YARN-8175
> Project: 

[jira] [Updated] (YARN-8349) Remove YARN registry entries when a service is killed by the RM

2018-06-01 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8349:
-
Attachment: YARN-8349.4.patch

> Remove YARN registry entries when a service is killed by the RM
> ---
>
> Key: YARN-8349
> URL: https://issues.apache.org/jira/browse/YARN-8349
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8349.1.patch, YARN-8349.2.patch, YARN-8349.3.patch, 
> YARN-8349.4.patch
>
>
> As the title states, when a service is killed by the RM (for exceeding its 
> lifetime for example), the YARN registry entries should be cleaned up.
> Without cleanup, DNS can contain multiple hostnames for a single IP address 
> in the case where IPs are reused. This impacts reverse lookups, which breaks 
> services, such as kerberos, that depend on those lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8175) Add support for Node Labels in SLS.

2018-06-01 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8175:

Attachment: YARN-8175.006.patch

> Add support for Node Labels in SLS.
> ---
>
> Key: YARN-8175
> URL: https://issues.apache.org/jira/browse/YARN-8175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8175.001.patch, YARN-8175.002.patch, 
> YARN-8175.003.patch, YARN-8175.004.patch, YARN-8175.005.patch, 
> YARN-8175.006.patch
>
>
> Currently, SLS doesn't support node labels. With this jira, we are planning 
> to add support for node labels in SLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8361) Change App Name Placement Rule to use App Name instead of App Id for configuration

2018-06-01 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498383#comment-16498383
 ] 

Suma Shivaprasad commented on YARN-8361:


[~Zian Chen] Patch looks good overall. I still see a reference to application 
id - "application id, *%application* can be used" in the documentation. Can you 
pls fix this ?

> Change App Name Placement Rule to use App Name instead of App Id for 
> configuration
> --
>
> Key: YARN-8361
> URL: https://issues.apache.org/jira/browse/YARN-8361
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8361.001.patch
>
>
> 1. AppNamePlacementRule used app id while specifying queue mapping placement 
> rules, should change to app name
> 2. Change documentation as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498365#comment-16498365
 ] 

Eric Yang commented on YARN-8220:
-

[~leftnoteasy] {quote}
Entry point is a nice feature for static command. (For example default TF 
docker image which start notebook by default: 
https://github.com/tensorflow/tensorflow/tree/r1.8/tensorflow/tools/docker). 
For training program, since user need to do a lot of hyper parameter tuning, 
user will update such parameters to make it work.{quote}

Additional parameters can pass to ENTRYPOINT via CMD, which is same as 
specifying it in launch_command, if ENTRYPOINT is in use.  Yarnfile is shorten 
to:

{code}
{
  ..
"launch_command":"--train-steps=1,--trans-batch-size=16"
  ..
}
{code}

Or any parameters that has not been specified in ENTRYPOINT+CMD combination.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498349#comment-16498349
 ] 

Eric Yang commented on YARN-8342:
-

[~billie.rinaldi] Thanks for the review.  I made correction to the keyword in 
patch 004.

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8342.001.patch, YARN-8342.002.patch, 
> YARN-8342.003.patch, YARN-8342.004.patch
>
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8342:

Attachment: YARN-8342.004.patch

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8342.001.patch, YARN-8342.002.patch, 
> YARN-8342.003.patch, YARN-8342.004.patch
>
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-01 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8379:
-
Description: 
Existing capacity scheduler only supports preemption for an underutilized queue 
to reach its guaranteed resource. In addition to that, there’s an requirement 
to get better balance between queues when all of them reach guaranteed resource 
but with different fairness resource.

An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c = 
40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing scheduler 
preemption won't happen. But this is unfair to queue_b since queue_a has the 
same guaranteed resources.

Before YARN-5864, capacity scheduler do additional preemption to balance 
queues. We changed the logic since it could preempt too many containers between 
queues when all queues are satisfied.

  was:
Existing capacity scheduler only supports preemption for an underutilized queue 
to reach its guaranteed resource. In addition to that, there’s an requirement 
to get better balance between queues when all of them reach guaranteed resource 
but with different fairness resource.

An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c = 
40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing scheduler 
preemption won't happen. But this is unfair to queue_b since queue_b has the 
same guaranteed resources.

Before YARN-5864, capacity scheduler do additional preemption to balance 
queues. We changed the logic since it could preempt too many containers between 
queues when all queues are satisfied.


> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_b since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-01 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8379:
-
Description: 
Existing capacity scheduler only supports preemption for an underutilized queue 
to reach its guaranteed resource. In addition to that, there’s an requirement 
to get better balance between queues when all of them reach guaranteed resource 
but with different fairness resource.

An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c = 
40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing scheduler 
preemption won't happen. But this is unfair to queue_a since queue_a has the 
same guaranteed resources.

Before YARN-5864, capacity scheduler do additional preemption to balance 
queues. We changed the logic since it could preempt too many containers between 
queues when all queues are satisfied.

  was:
Existing capacity scheduler only supports preemption for an underutilized queue 
to reach its guaranteed resource. In addition to that, there’s an requirement 
to get better balance between queues when all of them reach guaranteed resource 
but with different fairness resource.

An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c = 
40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing scheduler 
preemption won't happen. But this is unfair to queue_b since queue_a has the 
same guaranteed resources.

Before YARN-5864, capacity scheduler do additional preemption to balance 
queues. We changed the logic since it could preempt too many containers between 
queues when all queues are satisfied.


> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8372) ApplicationAttemptNotFoundException should be handled correctly by Distributed Shell App Master

2018-06-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498279#comment-16498279
 ] 

genericqa commented on YARN-8372:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:
 The patch generated 0 new + 135 unchanged - 5 fixed = 135 total (was 140) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m  
7s{color} | {color:green} hadoop-yarn-applications-distributedshell in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8372 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12926113/YARN-8372.3.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1787ae0b483d 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6b21a59 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20922/testReport/ |
| Max. process+thread count | 626 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 U: 

[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-06-01 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498276#comment-16498276
 ] 

Billie Rinaldi commented on YARN-7962:
--

Sounds good to me. Thanks, [~leftnoteasy]!

> Race Condition When Stopping DelegationTokenRenewer
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Critical
> Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, 
> YARN-7962.4.patch, YARN-7962.6.patch, YARN-7962.7.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498270#comment-16498270
 ] 

Wangda Tan commented on YARN-8220:
--

Thanks [~eyang] for your comments,

For your comments:
bq. 1. Avoid using bash style launch command 
Entry point is a nice feature for static command. (For example default TF 
docker image which start notebook by default: 
https://github.com/tensorflow/tensorflow/tree/r1.8/tensorflow/tools/docker). 
For training program, since user need to do a lot of hyper parameter tuning, 
user will update such parameters to make it work. 

bq. 2. It might be good to show case some yarnfile features:
We intentionally want to avoid user specify this. It is a burden for user to 
specify such mounting. In side submit_tf.py, we use the feature you mentioned. 

bq. 3. Downloading source code from individual github contributors might be 
risky and prone to break
This is a good suggestion, will check if it is possible to commit example code 
to sub folder of this example.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-06-01 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498246#comment-16498246
 ] 

Wangda Tan commented on YARN-7962:
--

Thanks [~billie.rinaldi], to me, differences between ver.6 and ver.7 patch is 
minimum. Since the isServiceStarted is volatile, both patches are safe. +1 to 
the ver.7 patch, will commit today if no objections.

> Race Condition When Stopping DelegationTokenRenewer
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Critical
> Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, 
> YARN-7962.4.patch, YARN-7962.6.patch, YARN-7962.7.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498247#comment-16498247
 ] 

Billie Rinaldi commented on YARN-8342:
--

Oh, I did notice one additional reference to 
docker.privileged-containers.registries in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Examples.md.

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8342.001.patch, YARN-8342.002.patch, 
> YARN-8342.003.patch
>
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored

2018-06-01 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498239#comment-16498239
 ] 

Billie Rinaldi commented on YARN-8342:
--

I looked over patch 3 and didn't see any issues. I'm going to test it out and I 
restarted the precommit build to see if we could get a cleaner run.

> Using docker image from a non-privileged registry, the launch_command is not 
> honored
> 
>
> Key: YARN-8342
> URL: https://issues.apache.org/jira/browse/YARN-8342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Eric Yang
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8342.001.patch, YARN-8342.002.patch, 
> YARN-8342.003.patch
>
>
> During test of the Docker feature, I found that if a container comes from 
> non-privileged docker registry, the specified launch command will be ignored. 
> Container will success without any log, which is very confusing to end users. 
> And this behavior is inconsistent to containers from privileged docker 
> registries.
> cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8349) Remove YARN registry entries when a service is killed by the RM

2018-06-01 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8349:
-
Attachment: YARN-8349.3.patch

> Remove YARN registry entries when a service is killed by the RM
> ---
>
> Key: YARN-8349
> URL: https://issues.apache.org/jira/browse/YARN-8349
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Shane Kumpf
>Assignee: Billie Rinaldi
>Priority: Major
> Attachments: YARN-8349.1.patch, YARN-8349.2.patch, YARN-8349.3.patch
>
>
> As the title states, when a service is killed by the RM (for exceeding its 
> lifetime for example), the YARN registry entries should be cleaned up.
> Without cleanup, DNS can contain multiple hostnames for a single IP address 
> in the case where IPs are reused. This impacts reverse lookups, which breaks 
> services, such as kerberos, that depend on those lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498199#comment-16498199
 ] 

Eric Yang commented on YARN-8220:
-

[~sunilg] Thank you for the patch, a couple suggestions:

1. Avoid using bash style launch command.  Although this is kind of working, 
but it greatly improves security and readability to use ENTRYPOINT, and CMD in 
Dockerfile.  For example:

{code}
WORKDIR /test/models/tutorials/image/cifar10_estimator 
ENTRYPOINT ["/usr/bin/python", "cifar10_main.py"]
CMD ["--data-dir=hdfs:///tmp/cifar-10-data"]
CMD ["--job-dir=hdfs:///tmp/cifar-10-jobdir"]
CMD ["--train-steps=1"]
CMD ["--eval-batch-size=16"]
CMD ["--train-batch-size=16"]
CMD ["--sync"]
CMD ["--num-gpus=2"]
{code}

This simplifies yarnfile, and prevent to run the script in wrong directory if 
working directory doesn't exist.

2. It might be good to show case some yarnfile features:

{code}
{
..
  "configuration": {
"env": {
  
"YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS":"/etc/hadoop/conf:/etc/hadoop/conf:ro",
  "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
}
  }
..
}
{code}

This helps to show case how to mount configuration files from host disks, and 
use ENTRYPOINT support.

3. Downloading source code from individual github contributors might be risky 
and prone to break.  If the source is small enough and donated to Apache, it 
would be better to host them locally.

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8372) ApplicationAttemptNotFoundException should be handled correctly by Distributed Shell App Master

2018-06-01 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498160#comment-16498160
 ] 

Suma Shivaprasad commented on YARN-8372:


Fixed checkstyle errors. Unable to reproduce test failures locally( Not sure if 
this is a flaky test). Retriggerring tests with new patch.

> ApplicationAttemptNotFoundException should be handled correctly by 
> Distributed Shell App Master
> ---
>
> Key: YARN-8372
> URL: https://issues.apache.org/jira/browse/YARN-8372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Reporter: Charan Hebri
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8372.1.patch, YARN-8372.2.patch, YARN-8372.3.patch
>
>
> {noformat}
> try {
>   response = client.allocate(progress);
> } catch (ApplicationAttemptNotFoundException e) {
> handler.onShutdownRequest();
> LOG.info("Shutdown requested. Stopping callback.");
> return;{noformat}
> is a code snippet from AMRMClientAsyncImpl. The corresponding 
> onShutdownRequest call for the Distributed Shell App master,
> {noformat}
> @Override
> public void onShutdownRequest() {
>   done = true;
> }{noformat}
> Due to the above change, the current behavior is that whenever an application 
> attempt fails due to a NM restart (NM where the DS AM is running), an 
> ApplicationAttemptNotFoundException is thrown and all containers for that 
> attempt including the ones that are running on other NMs are killed by the AM 
> and marked as COMPLETE. The subsequent attempt spawns new containers just 
> like a new attempt. This behavior is different to a Map Reduce application 
> where the containers are not killed.
> cc [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8372) ApplicationAttemptNotFoundException should be handled correctly by Distributed Shell App Master

2018-06-01 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8372:
---
Attachment: YARN-8372.3.patch

> ApplicationAttemptNotFoundException should be handled correctly by 
> Distributed Shell App Master
> ---
>
> Key: YARN-8372
> URL: https://issues.apache.org/jira/browse/YARN-8372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Reporter: Charan Hebri
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8372.1.patch, YARN-8372.2.patch, YARN-8372.3.patch
>
>
> {noformat}
> try {
>   response = client.allocate(progress);
> } catch (ApplicationAttemptNotFoundException e) {
> handler.onShutdownRequest();
> LOG.info("Shutdown requested. Stopping callback.");
> return;{noformat}
> is a code snippet from AMRMClientAsyncImpl. The corresponding 
> onShutdownRequest call for the Distributed Shell App master,
> {noformat}
> @Override
> public void onShutdownRequest() {
>   done = true;
> }{noformat}
> Due to the above change, the current behavior is that whenever an application 
> attempt fails due to a NM restart (NM where the DS AM is running), an 
> ApplicationAttemptNotFoundException is thrown and all containers for that 
> attempt including the ones that are running on other NMs are killed by the AM 
> and marked as COMPLETE. The subsequent attempt spawns new containers just 
> like a new attempt. This behavior is different to a Map Reduce application 
> where the containers are not killed.
> cc [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-06-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498149#comment-16498149
 ] 

genericqa commented on YARN-7962:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 
41s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}121m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-7962 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12926095/YARN-7962.7.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 856761d5c2b1 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6b21a59 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20921/testReport/ |
| Max. process+thread count | 869 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20921/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Race Condition When Stopping DelegationTokenRenewer

[jira] [Comment Edited] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader

2018-06-01 Thread Sushil Ks (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480124#comment-16480124
 ] 

Sushil Ks edited comment on YARN-8270 at 6/1/18 3:31 PM:
-

Hi Haibo, thanks for reviewing please find my comments below,

1) Yes the collector metric is an aggregate for all collectors on the same node 
i.e on all NM and RM
 2) Do you mean separate metrics by *TimelineEntityType*? like 
YARN_APPLICATION, YARN_APPLICATION_ATTEMPT etc.
 3) My bad its actually private, sure makes sense will make it *update*Metrics()

4) Will add double check inside getInstance() for thread safety.


was (Author: sushil-k-s):
Hi Haibo, thanks for reviewing please find my comments below,

1) Yes the collector metric is an aggregate for all collectors on the same node 
i.e on all NM and RM
2) Do you mean separate metrics by *TimelineEntityType*? like YARN_APPLICATION, 
YARN_APPLICATION_ATTEMPT etc.
3) My bad its actually private, sure makes sense will make it *update*Metrics()

4) I made use of *double checked locking pattern*  for getInstance() is it 
still not thread-safe?

> Adding JMX Metrics for Timeline Collector and Reader
> 
>
> Key: YARN-8270
> URL: https://issues.apache.org/jira/browse/YARN-8270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineserver
>Reporter: Sushil Ks
>Assignee: Sushil Ks
>Priority: Major
> Attachments: YARN-8270.001.patch
>
>
> This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and 
> Timeline Reader, basically for Timeline Collector it tries to capture 
> success, failure and latencies for *putEntities* and *putEntitiesAsync*  from 
> *TimelineCollectorWebService* and all the API's success, failure and 
> latencies for fetching TimelineEntities from *TimelineReaderWebServices*. 
> This would actually help in monitoring and measuring performance for ATSv2 at 
> scale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps

2018-06-01 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498085#comment-16498085
 ] 

genericqa commented on YARN-8319:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
11s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8319 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12926096/YARN-8319.addendum.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a950173da656 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6b21a59 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20920/testReport/ |
| Max. process+thread count | 304 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20920/console |
| Powered by | Apache Yetus 

[jira] [Updated] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps

2018-06-01 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8319:
-
Attachment: YARN-8319.addendum.001.patch

> More YARN pages need to honor yarn.resourcemanager.display.per-user-apps
> 
>
> Key: YARN-8319
> URL: https://issues.apache.org/jira/browse/YARN-8319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8319.001.patch, YARN-8319.002.patch, 
> YARN-8319.003.patch, YARN-8319.addendum.001.patch
>
>
> When this config is on
>  - Per queue page on UI2 should filter app list by user
>  -- TODO: Verify the same with UI1 Per-queue page
>  - ATSv2 with UI2 should filter list of all users' flows and flow activities
>  - Per Node pages
>  -- Listing of apps and containers on a per-node basis should filter apps and 
> containers by user.
> To this end, because this is no longer just for resourcemanager, we should 
> also deprecate {{yarn.resourcemanager.display.per-user-apps}} in favor of 
> {{yarn.webapp.filter-app-list-by-user}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps

2018-06-01 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan reopened YARN-8319:
--

Reopening this Jira. For ATSv2, currently only app owner can see the flows 
list. Even  yarn admin could only see its own flows. YARN admin should ideally 
to have visibility of all flows. However ATSv2 ACL story is under progress, 
hence to complete the user filtering story, an interim handling is needed.

> More YARN pages need to honor yarn.resourcemanager.display.per-user-apps
> 
>
> Key: YARN-8319
> URL: https://issues.apache.org/jira/browse/YARN-8319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8319.001.patch, YARN-8319.002.patch, 
> YARN-8319.003.patch
>
>
> When this config is on
>  - Per queue page on UI2 should filter app list by user
>  -- TODO: Verify the same with UI1 Per-queue page
>  - ATSv2 with UI2 should filter list of all users' flows and flow activities
>  - Per Node pages
>  -- Listing of apps and containers on a per-node basis should filter apps and 
> containers by user.
> To this end, because this is no longer just for resourcemanager, we should 
> also deprecate {{yarn.resourcemanager.display.per-user-apps}} in favor of 
> {{yarn.webapp.filter-app-list-by-user}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-06-01 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-7962:
-
Attachment: YARN-7962.7.patch

> Race Condition When Stopping DelegationTokenRenewer
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Critical
> Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, 
> YARN-7962.4.patch, YARN-7962.6.patch, YARN-7962.7.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-06-01 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498013#comment-16498013
 ] 

Billie Rinaldi commented on YARN-7962:
--

Patch 7 is the same as patch 3 without the serviceStart changes.

> Race Condition When Stopping DelegationTokenRenewer
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Critical
> Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, 
> YARN-7962.4.patch, YARN-7962.6.patch, YARN-7962.7.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7962) Race Condition When Stopping DelegationTokenRenewer

2018-06-01 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498004#comment-16498004
 ] 

Billie Rinaldi commented on YARN-7962:
--

Ah, I see what Wilfred is saying now. I don't think the locking is done 
correctly in patch 6. I think Wangda is right that isServiceStarted = false and 
renewerService.shutdown() both need to be performed while the lock is held in 
serviceStop, and Wilfred is right that the locking in serviceStart should not 
be changed. ThreadPoolExecutor.shutdown says that previously submitted tasks 
are executed, but it still seems like it would be safer to include the shutdown 
in the lock. I'll attach patch 7 with my proposed changes.

> Race Condition When Stopping DelegationTokenRenewer
> ---
>
> Key: YARN-7962
> URL: https://issues.apache.org/jira/browse/YARN-7962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Critical
> Attachments: YARN-7962.1.patch, YARN-7962.2.patch, YARN-7962.3.patch, 
> YARN-7962.4.patch, YARN-7962.6.patch
>
>
> [https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java]
> {code:java}
>   private ThreadPoolExecutor renewerService;
>   private void processDelegationTokenRenewerEvent(
>   DelegationTokenRenewerEvent evt) {
> serviceStateLock.readLock().lock();
> try {
>   if (isServiceStarted) {
> renewerService.execute(new DelegationTokenRenewerRunnable(evt));
>   } else {
> pendingEventQueue.add(evt);
>   }
> } finally {
>   serviceStateLock.readLock().unlock();
> }
>   }
>   @Override
>   protected void serviceStop() {
> if (renewalTimer != null) {
>   renewalTimer.cancel();
> }
> appTokens.clear();
> allTokens.clear();
> this.renewerService.shutdown();
> {code}
> {code:java}
> 2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.util.concurrent.RejectedExecutionException: Task 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2
>  rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
>   at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
>   at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> What I think is going on here is that the {{serviceStop}} method is not 
> setting the {{isServiceStarted}} flag to 'false'.
> Please update so that the {{serviceStop}} method grabs the 
> {{serviceStateLock}} and sets {{isServiceStarted}} to _false_, before 
> shutting down the {{renewerService}} thread pool, to avoid this condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8372) ApplicationAttemptNotFoundException should be handled correctly by Distributed Shell App Master

2018-06-01 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497823#comment-16497823
 ] 

Rohith Sharma K S commented on YARN-8372:
-

Thanks [~suma.shivaprasad] for the patch. Approach looks good to me. Would you 
look at test and checkstyle errors?

> ApplicationAttemptNotFoundException should be handled correctly by 
> Distributed Shell App Master
> ---
>
> Key: YARN-8372
> URL: https://issues.apache.org/jira/browse/YARN-8372
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell
>Reporter: Charan Hebri
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8372.1.patch, YARN-8372.2.patch
>
>
> {noformat}
> try {
>   response = client.allocate(progress);
> } catch (ApplicationAttemptNotFoundException e) {
> handler.onShutdownRequest();
> LOG.info("Shutdown requested. Stopping callback.");
> return;{noformat}
> is a code snippet from AMRMClientAsyncImpl. The corresponding 
> onShutdownRequest call for the Distributed Shell App master,
> {noformat}
> @Override
> public void onShutdownRequest() {
>   done = true;
> }{noformat}
> Due to the above change, the current behavior is that whenever an application 
> attempt fails due to a NM restart (NM where the DS AM is running), an 
> ApplicationAttemptNotFoundException is thrown and all containers for that 
> attempt including the ones that are running on other NMs are killed by the AM 
> and marked as COMPLETE. The subsequent attempt spawns new containers just 
> like a new attempt. This behavior is different to a Map Reduce application 
> where the containers are not killed.
> cc [~rohithsharma]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497708#comment-16497708
 ] 

Sunil Govindan commented on YARN-8220:
--

Attaching v1 patch. This patch majorly covers all scripts/examples/docker file 
etc which will help to run Tensorflow on YARN (Distributed/Standalone).

Thank you very much [~leftnoteasy] for helping out to integrate TF in YARN with 
GPU/Docker.

 

Details of this work:
 # Script to auto-generate native service spec file for Tensorflow jobs which 
will auto submit service to YARN. This will help to run TF jobs on YARN without 
any complexity. Detailed example is available in the doc.
 # Support to run latest Tensorflow 1.8 and CUDA 9  on YARN.
 # Distributed Tensorflow support. User could simply run this by providing 
{{--distributed}} option the script and multiple *worker* could run in 
different nodes and could leverage the resources in YARN.
 # Dockerfile is provided for various cases (GPU/CPU, Different Tensorflow 
versions) etc.
 # Various tests are done based on TF version / GPU etc and results are 
published as part of the document in the patch.

Example:
{code:java}
python submit_tf_job.py --remote_conf_path hdfs:///tf-job-conf --input_spec 
example_tf_job_spec.json --docker_image gpu.cuda_9.0.tf_1.8.0 --job_name 
distributed-tf-gpu --user tf-user --domain tensorflow.site --distributed 
--kerberos
{code}
cc [~vinodkv] [~rohithsharma]

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8220:
-
Attachment: YARN-8220.001.patch

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
> Attachments: YARN-8220.001.patch
>
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8220) Running Tensorflow on YARN with GPU and Docker - Examples

2018-06-01 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8220:
-
Summary: Running Tensorflow on YARN with GPU and Docker - Examples  (was: 
Tensorflow yarn spec file to add to native service examples)

> Running Tensorflow on YARN with GPU and Docker - Examples
> -
>
> Key: YARN-8220
> URL: https://issues.apache.org/jira/browse/YARN-8220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Critical
>
> Tensorflow could be run on YARN and could leverage YARN's distributed 
> features.
> This spec fill will help to run Tensorflow on yarn with GPU/docker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org