[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-02 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: hadoop_2.9.0.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2-gpu.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0-gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8489) Need to support customer termination policy for native services

2018-07-02 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8489:


 Summary: Need to support customer termination policy for native 
services
 Key: YARN-8489
 URL: https://issues.apache.org/jira/browse/YARN-8489
 Project: Hadoop YARN
  Issue Type: Task
  Components: yarn-native-services
Reporter: Wangda Tan


Existing YARN service support termination policy for different restart 
policies. For example ALWAYS means service will not be terminated. And NEVER 
means if all component terminated, service will be terminated. 

There're some jobs/services need different policy. For example, if Tensorflow 
master component terminated (regardless of succeed or finished), we need to 
terminate whole training job regardless or other states of other components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8489) Need to support customer termination policy for native services

2018-07-02 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530825#comment-16530825
 ] 

Wangda Tan commented on YARN-8489:
--

cc: [~gsaha], [~csingh], [~billie.rinaldi], [~eyang]

> Need to support customer termination policy for native services
> ---
>
> Key: YARN-8489
> URL: https://issues.apache.org/jira/browse/YARN-8489
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Priority: Major
>
> Existing YARN service support termination policy for different restart 
> policies. For example ALWAYS means service will not be terminated. And NEVER 
> means if all component terminated, service will be terminated. 
> There're some jobs/services need different policy. For example, if Tensorflow 
> master component terminated (regardless of succeed or finished), we need to 
> terminate whole training job regardless or other states of other components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-07-02 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8488:
-
Target Version/s: 3.2.0

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Priority: Major
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-07-02 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530821#comment-16530821
 ] 

Wangda Tan commented on YARN-8488:
--

cc: [~gsaha], [~csingh], [~billie.rinaldi], [~eyang]

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Priority: Major
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-07-02 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8488:


 Summary: Need to add "SUCCEED" state to YARN service
 Key: YARN-8488
 URL: https://issues.apache.org/jira/browse/YARN-8488
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Wangda Tan


Existing YARN service has following states:

{code} 
public enum ServiceState {
  ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
  UPGRADING_AUTO_FINALIZE;
}
{code} 

Ideally we should add "SUCCEEDED" state in order to support long running 
applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service

2018-07-02 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8488:
-
Component/s: yarn-native-services

> Need to add "SUCCEED" state to YARN service
> ---
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Priority: Major
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8487) Remove the unused Variable in TestBroadcastAMRMProxyFederationPolicy#testNotifyOfResponseFromUnknownSubCluster

2018-07-02 Thread Mukul Kumar Singh (JIRA)
Mukul Kumar Singh created YARN-8487:
---

 Summary: Remove the unused Variable in 
TestBroadcastAMRMProxyFederationPolicy#testNotifyOfResponseFromUnknownSubCluster
 Key: YARN-8487
 URL: https://issues.apache.org/jira/browse/YARN-8487
 Project: Hadoop YARN
  Issue Type: Bug
  Components: amrmproxy
Reporter: Mukul Kumar Singh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7971) add COOKIE when pass through headers in WebAppProxyServlet

2018-07-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530774#comment-16530774
 ] 

genericqa commented on YARN-7971:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
35s{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-7971 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12930052/YARN-7971.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a9edaffee52c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7296b64 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21168/testReport/ |
| Max. process+thread count | 303 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
|
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21168/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Issue Comment Deleted] (YARN-7971) add COOKIE when pass through headers in WebAppProxyServlet

2018-07-02 Thread Fan Yunbo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Yunbo updated YARN-7971:

Comment: was deleted

(was: add the patch)

> add COOKIE when pass through headers in WebAppProxyServlet
> --
>
> Key: YARN-7971
> URL: https://issues.apache.org/jira/browse/YARN-7971
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.4
>Reporter: Fan Yunbo
>Priority: Major
> Fix For: 2.6.6
>
> Attachments: YARN-7971.001.patch
>
>
> I am using Spark on Yarn and I add some authentication filters in spark web 
> server.
> And the filters need to add query string for authentication like
> {code:java}
> https://RM:8088/proxy/application_xxx_xxx?q1=xxx=xxx...
> {code}
> The filters will add cookies in headers when the web server respond the 
> request.
> However, the query string need to be added in the URL every time when I 
> access the web server because the app proxy servlet in Yarn doesn't pass the 
> cookies in headers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7971) add COOKIE when pass through headers in WebAppProxyServlet

2018-07-02 Thread Fan Yunbo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Yunbo updated YARN-7971:

Attachment: (was: YARN-7971.001.patch)

> add COOKIE when pass through headers in WebAppProxyServlet
> --
>
> Key: YARN-7971
> URL: https://issues.apache.org/jira/browse/YARN-7971
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.4
>Reporter: Fan Yunbo
>Priority: Major
>
> I am using Spark on Yarn and I add some authentication filters in spark web 
> server.
> And the filters need to add query string for authentication like
> {code:java}
> https://RM:8088/proxy/application_xxx_xxx?q1=xxx=xxx...
> {code}
> The filters will add cookies in headers when the web server respond the 
> request.
> However, the query string need to be added in the URL every time when I 
> access the web server because the app proxy servlet in Yarn doesn't pass the 
> cookies in headers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.

2018-07-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530681#comment-16530681
 ] 

Íñigo Goiri commented on YARN-8193:
---

bq. Maybe there's some quick fix for it?

I haven't seen a healthy run for branch-2 in months.
There was some discussion on this certificate issue but not sure who can fix 
this.

> YARN RM hangs abruptly (stops allocating resources) when running successive 
> applications.
> -
>
> Key: YARN-8193
> URL: https://issues.apache.org/jira/browse/YARN-8193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Critical
> Fix For: 2.9.0, 3.2.0, 3.1.1
>
> Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, 
> YARN-8193.002.patch
>
>
> When running massive queries successively, at some point RM just hangs and 
> stops allocating resources. At the point RM get hangs, YARN throw 
> NullPointerException  at RegularContainerAllocator.getLocalityWaitFactor.
> There's sufficient space given to yarn.nodemanager.local-dirs (not a node 
> health issue, RM didn't report any node being unhealthy). There is no fixed 
> trigger for this (query or operation).
> This problem goes away on restarting ResourceManager. No NM restart is 
> required. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8486) yarn.webapp.filter-entity-list-by-user should honor limit filter for TS reader flows api

2018-07-02 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-8486:
---

Assignee: Rohith Sharma K S

> yarn.webapp.filter-entity-list-by-user should honor limit filter for TS 
> reader flows api
> 
>
> Key: YARN-8486
> URL: https://issues.apache.org/jira/browse/YARN-8486
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Charan Hebri
>Assignee: Rohith Sharma K S
>Priority: Major
>
> Post YARN-8319, flows restrict entities per user.  If limit is applied to the 
> flows then returned values are inconsistent. Reason is if back end returned 
> values are 10 and contains no data for user1, then flows api returns empty. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8486) yarn.webapp.filter-entity-list-by-user should honor limit filter for TS reader flows api

2018-07-02 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-8486:
---

 Summary: yarn.webapp.filter-entity-list-by-user should honor limit 
filter for TS reader flows api
 Key: YARN-8486
 URL: https://issues.apache.org/jira/browse/YARN-8486
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Charan Hebri


Post YARN-8319, flows restrict entities per user.  If limit is applied to the 
flows then returned values are inconsistent. Reason is if back end returned 
values are 10 and contains no data for user1, then flows api returns empty. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.

2018-07-02 Thread Xiao Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530626#comment-16530626
 ] 

Xiao Liang edited comment on YARN-8193 at 7/3/18 12:39 AM:
---

The build failed due to some reason not related to the patch:
 npm ERR! Error: CERT_UNTRUSTED
 npm ERR! at SecurePair. (tls.js:1370:32)
 npm ERR! at SecurePair.EventEmitter.emit (events.js:92:17)
 npm ERR! at SecurePair.maybeInitFinished (tls.js:982:10)
 npm ERR! at CleartextStream.read [as _read] (tls.js:469:13)
 npm ERR! at CleartextStream.Readable.read (_stream_readable.js:320:10)
 npm ERR! at EncryptedStream.write [as _write] (tls.js:366:25)
 npm ERR! at doWrite (_stream_writable.js:223:10)
 npm ERR! at writeOrBuffer (_stream_writable.js:213:5)
 npm ERR! at EncryptedStream.Writable.write (_stream_writable.js:180:11)
 npm ERR! at write (_stream_readable.js:583:24)
 npm ERR! If you need help, you may report this log at:
 npm ERR! <
 [http://github.com/isaacs/npm/issues]
 >
 npm ERR! or email it to:
 npm ERR! 

npm ERR! System Linux 3.13.0-139-generic
 npm ERR! command "/usr/bin/nodejs" "/usr/bin/npm" "install" "-g" "bower"
 npm ERR! cwd /root
 npm ERR! node -v v0.10.25
 npm ERR! npm -v 1.3.10
 npm ERR! 
 npm ERR! Additional logging details can be found in:
 npm ERR! /root/npm-debug.log
 npm ERR! not ok code 0

 

Maybe there's some quick fix for it?


was (Author: surmountian):
The build failed due to some reason not related to the patch:
npm ERR! Error: CERT_UNTRUSTED
npm ERR! at SecurePair. (tls.js:1370:32)
npm ERR! at SecurePair.EventEmitter.emit (events.js:92:17)
npm ERR! at SecurePair.maybeInitFinished (tls.js:982:10)
npm ERR! at CleartextStream.read [as _read] (tls.js:469:13)
npm ERR! at CleartextStream.Readable.read (_stream_readable.js:320:10)
npm ERR! at EncryptedStream.write [as _write] (tls.js:366:25)
npm ERR! at doWrite (_stream_writable.js:223:10)
npm ERR! at writeOrBuffer (_stream_writable.js:213:5)
npm ERR! at EncryptedStream.Writable.write (_stream_writable.js:180:11)
npm ERR! at write (_stream_readable.js:583:24)
npm ERR! If you need help, you may report this log at:
npm ERR! <
[http://github.com/isaacs/npm/issues]
>
npm ERR! or email it to:
npm ERR! 

npm ERR! System Linux 3.13.0-139-generic
npm ERR! command "/usr/bin/nodejs" "/usr/bin/npm" "install" "-g" "bower"
npm ERR! cwd /root
npm ERR! node -v v0.10.25
npm ERR! npm -v 1.3.10
npm ERR! 
npm ERR! Additional logging details can be found in:
npm ERR! /root/npm-debug.log
npm ERR! not ok code 0

> YARN RM hangs abruptly (stops allocating resources) when running successive 
> applications.
> -
>
> Key: YARN-8193
> URL: https://issues.apache.org/jira/browse/YARN-8193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Critical
> Fix For: 2.9.0, 3.2.0, 3.1.1
>
> Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, 
> YARN-8193.002.patch
>
>
> When running massive queries successively, at some point RM just hangs and 
> stops allocating resources. At the point RM get hangs, YARN throw 
> NullPointerException  at RegularContainerAllocator.getLocalityWaitFactor.
> There's sufficient space given to yarn.nodemanager.local-dirs (not a node 
> health issue, RM didn't report any node being unhealthy). There is no fixed 
> trigger for this (query or operation).
> This problem goes away on restarting ResourceManager. No NM restart is 
> required. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.

2018-07-02 Thread Xiao Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530626#comment-16530626
 ] 

Xiao Liang commented on YARN-8193:
--

The build failed due to some reason not related to the patch:
npm ERR! Error: CERT_UNTRUSTED
npm ERR! at SecurePair. (tls.js:1370:32)
npm ERR! at SecurePair.EventEmitter.emit (events.js:92:17)
npm ERR! at SecurePair.maybeInitFinished (tls.js:982:10)
npm ERR! at CleartextStream.read [as _read] (tls.js:469:13)
npm ERR! at CleartextStream.Readable.read (_stream_readable.js:320:10)
npm ERR! at EncryptedStream.write [as _write] (tls.js:366:25)
npm ERR! at doWrite (_stream_writable.js:223:10)
npm ERR! at writeOrBuffer (_stream_writable.js:213:5)
npm ERR! at EncryptedStream.Writable.write (_stream_writable.js:180:11)
npm ERR! at write (_stream_readable.js:583:24)
npm ERR! If you need help, you may report this log at:
npm ERR! <
[http://github.com/isaacs/npm/issues]
>
npm ERR! or email it to:
npm ERR! 

npm ERR! System Linux 3.13.0-139-generic
npm ERR! command "/usr/bin/nodejs" "/usr/bin/npm" "install" "-g" "bower"
npm ERR! cwd /root
npm ERR! node -v v0.10.25
npm ERR! npm -v 1.3.10
npm ERR! 
npm ERR! Additional logging details can be found in:
npm ERR! /root/npm-debug.log
npm ERR! not ok code 0

> YARN RM hangs abruptly (stops allocating resources) when running successive 
> applications.
> -
>
> Key: YARN-8193
> URL: https://issues.apache.org/jira/browse/YARN-8193
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Critical
> Fix For: 2.9.0, 3.2.0, 3.1.1
>
> Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, 
> YARN-8193.002.patch
>
>
> When running massive queries successively, at some point RM just hangs and 
> stops allocating resources. At the point RM get hangs, YARN throw 
> NullPointerException  at RegularContainerAllocator.getLocalityWaitFactor.
> There's sufficient space given to yarn.nodemanager.local-dirs (not a node 
> health issue, RM didn't report any node being unhealthy). There is no fixed 
> trigger for this (query or operation).
> This problem goes away on restarting ResourceManager. No NM restart is 
> required. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8415) TimelineWebServices.getEntity should throw ForbiddenException instead of 404 when ACL checks fail

2018-07-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530554#comment-16530554
 ] 

Hudson commented on YARN-8415:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14515 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14515/])
YARN-8415. TimelineWebServices.getEntity should throw ForbiddenException 
(sunilg: rev fa9ef15ecd6dc30fb260e1c342a2b51505d39b6b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/RollingLevelDBTimelineStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java


> TimelineWebServices.getEntity should throw ForbiddenException instead of 404 
> when ACL checks fail
> -
>
> Key: YARN-8415
> URL: https://issues.apache.org/jira/browse/YARN-8415
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Suma Shivaprasad
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8415.1.patch, YARN-8415.2.patch, YARN-8415.3.patch
>
>
> {noformat}
> private TimelineEntity doGetEntity(
>   String entityType,
>   String entityId,
>   EnumSet fields,
>   UserGroupInformation callerUGI) throws YarnException, IOException {
> TimelineEntity entity = null;
> entity =
> store.getEntity(entityId, entityType, fields);
> if (entity != null) {
>   addDefaultDomainIdIfAbsent(entity);
>   // check ACLs
>   if (!timelineACLsManager.checkAccess(
>   callerUGI, ApplicationAccessType.VIEW_APP, entity)) {
>   entity = null;   //Should differentiate from an entity get failure 
> vs ACL check failure here by throwing an Exception.*
>   }
> }
> return entity;
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530523#comment-16530523
 ] 

Hudson commented on YARN-8485:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14514 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14514/])
YARN-8485. Priviledged container app launch is failing intermittently. (skumpf: 
rev 53e267fa7232add3c21174382d91b2607aa6becf)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c


> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: 

[jira] [Commented] (YARN-6672) Add NM preemption of opportunistic containers when utilization goes high

2018-07-02 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530517#comment-16530517
 ] 

Haibo Chen commented on YARN-6672:
--

Thanks [~elgoiri] for the extensive reviews, and [~szegedim] for additional 
review. I will commit it to YARN-1011 branch shortly.

> Add NM preemption of opportunistic containers when utilization goes high
> 
>
> Key: YARN-6672
> URL: https://issues.apache.org/jira/browse/YARN-6672
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-6672-YARN-1011.00.patch, 
> YARN-6672-YARN-1011.01.patch, YARN-6672-YARN-1011.02.patch, 
> YARN-6672-YARN-1011.03.patch, YARN-6672-YARN-1011.04.patch, 
> YARN-6672-YARN-1011.05.patch, YARN-6672-YARN-1011.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530514#comment-16530514
 ] 

Eric Yang commented on YARN-8485:
-

Thank you [~shaneku...@gmail.com] for the review and commit.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-06-28 

[jira] [Updated] (YARN-8415) TimelineWebServices.getEntity should throw ForbiddenException instead of 404 when ACL checks fail

2018-07-02 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8415:
-
Summary: TimelineWebServices.getEntity should throw ForbiddenException 
instead of 404 when ACL checks fail  (was: TimelineWebServices.getEntity should 
throw a ForbiddenException(403) instead of 404 when ACL checks fail)

> TimelineWebServices.getEntity should throw ForbiddenException instead of 404 
> when ACL checks fail
> -
>
> Key: YARN-8415
> URL: https://issues.apache.org/jira/browse/YARN-8415
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8415.1.patch, YARN-8415.2.patch, YARN-8415.3.patch
>
>
> {noformat}
> private TimelineEntity doGetEntity(
>   String entityType,
>   String entityId,
>   EnumSet fields,
>   UserGroupInformation callerUGI) throws YarnException, IOException {
> TimelineEntity entity = null;
> entity =
> store.getEntity(entityId, entityType, fields);
> if (entity != null) {
>   addDefaultDomainIdIfAbsent(entity);
>   // check ACLs
>   if (!timelineACLsManager.checkAccess(
>   callerUGI, ApplicationAccessType.VIEW_APP, entity)) {
>   entity = null;   //Should differentiate from an entity get failure 
> vs ACL check failure here by throwing an Exception.*
>   }
> }
> return entity;
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530510#comment-16530510
 ] 

Shane Kumpf commented on YARN-8485:
---

Thanks to [~yeshavora] for reporting this, [~eyang] for the contribution, and 
[~gsaha] for the review! I committed this to trunk and branch-3.1.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> 

[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8485:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-3611

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
> 

[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8485:
--
Affects Version/s: 3.1.1
   3.2.0
 Target Version/s: 3.2.0, 3.1.1
   Labels: Docker  (was: )

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-06-28 

[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530487#comment-16530487
 ] 

genericqa commented on YARN-8485:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 46m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
60m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 
41s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}109m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8485 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/1293/YARN-8485.002.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 156dcaa2cc8d 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ab2f834 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21167/testReport/ |
| Max. process+thread count | 334 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21167/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> 

[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530482#comment-16530482
 ] 

Shane Kumpf commented on YARN-8485:
---

{code}by checking /usr/bin/sudo is good enough{code}
I agree this should be enough for now and is the least risky change. We can 
open a follow on effort to make this configurable if we find an operating 
system where this is needed. +1 on the latest patch, pending pre-commit.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 

[jira] [Comment Edited] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530482#comment-16530482
 ] 

Shane Kumpf edited comment on YARN-8485 at 7/2/18 9:58 PM:
---

{quote}by checking /usr/bin/sudo is good enough{quote}
I agree this should be enough for now and is the least risky change. We can 
open a follow on effort to make this configurable if we find an operating 
system where this is needed. +1 on the latest patch, pending pre-commit.


was (Author: shaneku...@gmail.com):
{code}by checking /usr/bin/sudo is good enough{code}
I agree this should be enough for now and is the least risky change. We can 
open a follow on effort to make this configurable if we find an operating 
system where this is needed. +1 on the latest patch, pending pre-commit.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 

[jira] [Commented] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router

2018-07-02 Thread Tanuj Nayak (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530467#comment-16530467
 ] 

Tanuj Nayak commented on YARN-8435:
---

This looks good [~NeoMatrix] and also seems to still functionally work.

> NPE when the same client simultaneously contact for the first time Yarn Router
> --
>
> Key: YARN-8435
> URL: https://issues.apache.org/jira/browse/YARN-8435
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: router
>Affects Versions: 2.9.0, 3.0.2
>Reporter: rangjiaheng
>Priority: Critical
> Attachments: YARN-8435.v1.patch, YARN-8435.v2.patch, 
> YARN-8435.v3.patch, YARN-8435.v4.patch, YARN-8435.v5.patch, YARN-8435.v6.patch
>
>
> When Two client process (with the same user name and the same hostname) begin 
> to connect to yarn router at the same time, to submit application, kill 
> application, ... and so on, then a java.lang.NullPointerException may throws 
> from yarn router.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8155) Improve ATSv2 client logging in RM and NM publisher

2018-07-02 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530457#comment-16530457
 ] 

Rohith Sharma K S commented on YARN-8155:
-

[~abmodi] it looks branch-2 patch is not running in jenkins rather it is 
running with trunk patch. May be need to change branch-2 patch format. 

> Improve ATSv2 client logging in RM and NM publisher
> ---
>
> Key: YARN-8155
> URL: https://issues.apache.org/jira/browse/YARN-8155
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rohith Sharma K S
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8155-branch-2.v1.patch, YARN-8155.001.patch, 
> YARN-8155.002.patch, YARN-8155.003.patch, YARN-8155.004.patch, 
> YARN-8155.005.patch, YARN-8155.006.patch
>
>
> We see that NM logs are filled with larger stack trace of NotFoundException 
> if collector is removed from one of the NM and other NMs are still publishing 
> the entities.
>  
> This Jira is to improve the logging in NM so that we log with informative 
> message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530442#comment-16530442
 ] 

Eric Yang commented on YARN-8485:
-

[~gsaha], by checking /usr/bin/sudo is good enough.  /bin/sudo was a side 
effect that RHEL7+ distro have /bin being a symlink to /usr/bin.  Therefore, 
/usr/bin/sudo is the right place to check.  It works in RHEL family as well as 
Debian family.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  

[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530428#comment-16530428
 ] 

Gour Saha commented on YARN-8485:
-

bq. This would ensure we don't accidentally call a rogue sudo command
I actually agree to this, since a rogue user could add any rogue sudo script to 
the PATH and pass this check. +1 to the get_docker_binary style OR explicitly 
checking both /bin/sudo and /usr/bin/sudo to keep the patch simple for now. We 
should fail if both the paths fail.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error 

[jira] [Commented] (YARN-8155) Improve ATSv2 client logging in RM and NM publisher

2018-07-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530421#comment-16530421
 ] 

genericqa commented on YARN-8155:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 
30s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8155 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928076/YARN-8155.006.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 484a069e4f99 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ab2f834 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21166/testReport/ |
| Max. process+thread count | 425 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21166/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Commented] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states

2018-07-02 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530411#comment-16530411
 ] 

Sunil Govindan commented on YARN-8459:
--

Thanks [~leftnoteasy]. Latest change seems fine. It covers the case mentioned 
by [~Tao Yang]

I will commit this patch by end of day if no other objections.

> Improve logs of Capacity Scheduler to better debug invalid states
> -
>
> Key: YARN-8459
> URL: https://issues.apache.org/jira/browse/YARN-8459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8459.001.patch, YARN-8459.002.patch, 
> YARN-8459.003.patch, YARN-8459.004.patch
>
>
> Improve logs in CS to better debug invalid states



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530397#comment-16530397
 ] 

Eric Yang commented on YARN-8485:
-

[~shaneku...@gmail.com] thank you for the review.  Rogue sudo could be a real 
threat with the relaxed security on patch 001.  It looks like most Linux distro 
have agreed on using /usr/bin/sudo path for sudo binary.  It is probably safer 
to use the standard path than introducing another config late in 3.1.1 release. 
 Hence, patch 002 provides the required fix without compromise security.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error 

[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8485:

Attachment: YARN-8485.002.patch

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-06-28 21:21:15,669 INFO  

[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530377#comment-16530377
 ] 

Shane Kumpf commented on YARN-8485:
---

Thanks for the patch [~eyang]! This did work for my test, but I'm thinking if 
we might want to treat sudo similar to {{get_docker_binary()}} in 
{{docker-util.c}}. This would ensure we don't accidentally call a rogue sudo 
command since it would be set in {{container-executor.cfg}}. Also, I noticed 
CentOS 7 does have {{/usr/bin/sudo }}(which is also what Debian uses), so that 
might be a good fallback if the user hasn't set the sudo binary path in 
{{container-executor.cfg}}, but I don't have a strong preference there.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: 

[jira] [Commented] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states

2018-07-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530366#comment-16530366
 ] 

genericqa commented on YARN-8459:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 
55s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8459 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929980/YARN-8459.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ed01fc5eb0c8 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1804a31 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21163/testReport/ |
| Max. process+thread count | 920 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21163/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Commented] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-07-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530340#comment-16530340
 ] 

genericqa commented on YARN-8473:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m  0s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.application.TestApplication |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8473 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929983/YARN-8473.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bc21c589a317 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1804a31 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21164/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21164/testReport/ |
| Max. process+thread count | 334 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 

[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530330#comment-16530330
 ] 

genericqa commented on YARN-8485:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
38m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 
41s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8485 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929986/YARN-8485.001.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 7bf3fe745bd4 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1804a31 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21165/testReport/ |
| Max. process+thread count | 336 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21165/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   

[jira] [Commented] (YARN-8155) Improve ATSv2 client logging in RM and NM publisher

2018-07-02 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530315#comment-16530315
 ] 

Rohith Sharma K S commented on YARN-8155:
-

Kicked in jenkin for rerun again..  I will commit it once jenkins report result.

> Improve ATSv2 client logging in RM and NM publisher
> ---
>
> Key: YARN-8155
> URL: https://issues.apache.org/jira/browse/YARN-8155
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rohith Sharma K S
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8155-branch-2.v1.patch, YARN-8155.001.patch, 
> YARN-8155.002.patch, YARN-8155.003.patch, YARN-8155.004.patch, 
> YARN-8155.005.patch, YARN-8155.006.patch
>
>
> We see that NM logs are filled with larger stack trace of NotFoundException 
> if collector is removed from one of the NM and other NMs are still publishing 
> the entities.
>  
> This Jira is to improve the logging in NM so that we log with informative 
> message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8465) Dshell docker container gets marked as lost after NM restart

2018-07-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530249#comment-16530249
 ] 

Hudson commented on YARN-8465:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14511 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14511/])
YARN-8465. Fixed docker container status for node manager restart.   
(eyang: rev 5cc2541a163591181b80bf2ec42c1e7e7f8929f5)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerExecutionException.java


> Dshell docker container gets marked as lost after NM restart
> 
>
> Key: YARN-8465
> URL: https://issues.apache.org/jira/browse/YARN-8465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Yesha Vora
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8465.001.patch
>
>
> scenario:
> 1) launch dshell application
> {code}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar
>   -shell_command "sleep 500" -num_containers 2 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xx/httpd:0.1 -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar{code}
> 2) wait for app to be in stable state ( 
> container_e01_1529968198450_0001_01_02 is running on host7 and 
> container_e01_1529968198450_0001_01_03 is running on host5)
> 3) restart NM (host7)
> Here, dshell application fails with below error
> {code}18/06/25 23:35:30 INFO distributedshell.Client: Got application report 
> from ASM for, appId=1, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, 
> service:  }, appDiagnostics=, appMasterHost=host9/xxx, appQueue=default, 
> appMasterRpcPort=-1, appStartTime=1529969211776, yarnAppState=RUNNING, 
> distributedFinalState=UNDEFINED, 
> appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, 
> appUser=hbase
> 18/06/25 23:35:31 INFO distributedshell.Client: Got application report from 
> ASM for, appId=1, clientToAMToken=null, appDiagnostics=Application Failure: 
> desired = 2, completed = 2, allocated = 2, failed = 1, diagnostics = 
> [2018-06-25 23:35:28.000]Container exited with a non-zero exit code 154
> [2018-06-25 23:35:28.001]Container exited with a non-zero exit code 154
> , appMasterHost=host9/xxx, appQueue=default, appMasterRpcPort=-1, 
> appStartTime=1529969211776, yarnAppState=FINISHED, 
> distributedFinalState=FAILED, 
> appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, 
> appUser=hbase
> 18/06/25 23:35:31 INFO distributedshell.Client: Application did finished 
> unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
> loop
> 18/06/25 23:35:31 ERROR distributedshell.Client: Application failed to 
> complete successfully{code}
> Here, the docker container marked as LOST after completion
> {code}
> 2018-06-25 23:35:27,970 WARN  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:signalContainer(1034)) - Signal docker 
> container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Liveliness check failed for PID: 423695. Container may have already 
> completed.
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.executeLivelinessCheck(DockerLinuxContainerRuntime.java:1208)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:1026)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:159)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:755)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:905)
> at 
> 

[jira] [Assigned] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-8485:
---

Assignee: Eric Yang

Convert sudo check to base on path variable.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-06-28 

[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8485:

Attachment: YARN-8485.001.patch

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Priority: Major
> Attachments: YARN-8485.001.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
> 

[jira] [Updated] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state

2018-07-02 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-8473:
-
Attachment: YARN-8473.001.patch

> Containers being launched as app tears down can leave containers in NEW state
> -
>
> Key: YARN-8473
> URL: https://issues.apache.org/jira/browse/YARN-8473
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.4
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-8473.001.patch
>
>
> I saw a case where containers were stuck on a nodemanager in the NEW state 
> because they tried to launch just as an application was tearing down.  The 
> container sent an INIT_CONTAINER event to the ApplicationImpl which then 
> executed an invalid transition since that event is not handled/expected when 
> the application is in the process of tearing down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states

2018-07-02 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530233#comment-16530233
 ] 

Wangda Tan commented on YARN-8459:
--

Attached patch (004) which moved re-reservation to debug log, and addressed 
comments from [~bibinchundatt] / [~Tao Yang]. 

Please review and let me know your thoughts.

> Improve logs of Capacity Scheduler to better debug invalid states
> -
>
> Key: YARN-8459
> URL: https://issues.apache.org/jira/browse/YARN-8459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8459.001.patch, YARN-8459.002.patch, 
> YARN-8459.003.patch, YARN-8459.004.patch
>
>
> Improve logs in CS to better debug invalid states



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states

2018-07-02 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8459:
-
Attachment: YARN-8459.004.patch

> Improve logs of Capacity Scheduler to better debug invalid states
> -
>
> Key: YARN-8459
> URL: https://issues.apache.org/jira/browse/YARN-8459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8459.001.patch, YARN-8459.002.patch, 
> YARN-8459.003.patch, YARN-8459.004.patch
>
>
> Improve logs in CS to better debug invalid states



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8465) Dshell docker container gets marked as lost after NM restart

2018-07-02 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530231#comment-16530231
 ] 

Shane Kumpf commented on YARN-8465:
---

Thanks [~eyang]!

> Dshell docker container gets marked as lost after NM restart
> 
>
> Key: YARN-8465
> URL: https://issues.apache.org/jira/browse/YARN-8465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Yesha Vora
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8465.001.patch
>
>
> scenario:
> 1) launch dshell application
> {code}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar
>   -shell_command "sleep 500" -num_containers 2 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xx/httpd:0.1 -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar{code}
> 2) wait for app to be in stable state ( 
> container_e01_1529968198450_0001_01_02 is running on host7 and 
> container_e01_1529968198450_0001_01_03 is running on host5)
> 3) restart NM (host7)
> Here, dshell application fails with below error
> {code}18/06/25 23:35:30 INFO distributedshell.Client: Got application report 
> from ASM for, appId=1, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, 
> service:  }, appDiagnostics=, appMasterHost=host9/xxx, appQueue=default, 
> appMasterRpcPort=-1, appStartTime=1529969211776, yarnAppState=RUNNING, 
> distributedFinalState=UNDEFINED, 
> appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, 
> appUser=hbase
> 18/06/25 23:35:31 INFO distributedshell.Client: Got application report from 
> ASM for, appId=1, clientToAMToken=null, appDiagnostics=Application Failure: 
> desired = 2, completed = 2, allocated = 2, failed = 1, diagnostics = 
> [2018-06-25 23:35:28.000]Container exited with a non-zero exit code 154
> [2018-06-25 23:35:28.001]Container exited with a non-zero exit code 154
> , appMasterHost=host9/xxx, appQueue=default, appMasterRpcPort=-1, 
> appStartTime=1529969211776, yarnAppState=FINISHED, 
> distributedFinalState=FAILED, 
> appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, 
> appUser=hbase
> 18/06/25 23:35:31 INFO distributedshell.Client: Application did finished 
> unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
> loop
> 18/06/25 23:35:31 ERROR distributedshell.Client: Application failed to 
> complete successfully{code}
> Here, the docker container marked as LOST after completion
> {code}
> 2018-06-25 23:35:27,970 WARN  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:signalContainer(1034)) - Signal docker 
> container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Liveliness check failed for PID: 423695. Container may have already 
> completed.
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.executeLivelinessCheck(DockerLinuxContainerRuntime.java:1208)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:1026)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:159)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:755)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:905)
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:284)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:721)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-06-25 23:35:27,975 WARN  nodemanager.LinuxContainerExecutor 
> 

[jira] [Commented] (YARN-8465) Dshell docker container gets marked as lost after NM restart

2018-07-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530221#comment-16530221
 ] 

Eric Yang commented on YARN-8465:
-

+1 works on my test cluster.  I will commit this shortly.

> Dshell docker container gets marked as lost after NM restart
> 
>
> Key: YARN-8465
> URL: https://issues.apache.org/jira/browse/YARN-8465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Yesha Vora
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8465.001.patch
>
>
> scenario:
> 1) launch dshell application
> {code}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar
>   -shell_command "sleep 500" -num_containers 2 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xx/httpd:0.1 -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar{code}
> 2) wait for app to be in stable state ( 
> container_e01_1529968198450_0001_01_02 is running on host7 and 
> container_e01_1529968198450_0001_01_03 is running on host5)
> 3) restart NM (host7)
> Here, dshell application fails with below error
> {code}18/06/25 23:35:30 INFO distributedshell.Client: Got application report 
> from ASM for, appId=1, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, 
> service:  }, appDiagnostics=, appMasterHost=host9/xxx, appQueue=default, 
> appMasterRpcPort=-1, appStartTime=1529969211776, yarnAppState=RUNNING, 
> distributedFinalState=UNDEFINED, 
> appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, 
> appUser=hbase
> 18/06/25 23:35:31 INFO distributedshell.Client: Got application report from 
> ASM for, appId=1, clientToAMToken=null, appDiagnostics=Application Failure: 
> desired = 2, completed = 2, allocated = 2, failed = 1, diagnostics = 
> [2018-06-25 23:35:28.000]Container exited with a non-zero exit code 154
> [2018-06-25 23:35:28.001]Container exited with a non-zero exit code 154
> , appMasterHost=host9/xxx, appQueue=default, appMasterRpcPort=-1, 
> appStartTime=1529969211776, yarnAppState=FINISHED, 
> distributedFinalState=FAILED, 
> appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, 
> appUser=hbase
> 18/06/25 23:35:31 INFO distributedshell.Client: Application did finished 
> unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
> loop
> 18/06/25 23:35:31 ERROR distributedshell.Client: Application failed to 
> complete successfully{code}
> Here, the docker container marked as LOST after completion
> {code}
> 2018-06-25 23:35:27,970 WARN  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:signalContainer(1034)) - Signal docker 
> container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Liveliness check failed for PID: 423695. Container may have already 
> completed.
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.executeLivelinessCheck(DockerLinuxContainerRuntime.java:1208)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:1026)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:159)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:755)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:905)
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:284)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:721)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-06-25 23:35:27,975 WARN  nodemanager.LinuxContainerExecutor 
> 

[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530218#comment-16530218
 ] 

Eric Yang commented on YARN-8485:
-

sudo binary path for Debian is different from Redhat/CentOS that caused the 
sudo check to fail.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Priority: Major
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-06-28 21:21:15,669 INFO  

[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8485:

Environment: Debian

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Priority: Major
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> docker error code=11, error message='Privileged containers are disabled'
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) -
> 2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell output: main : command 
> provided 4
> 2018-06-28 21:21:15,669 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - main : run as user is hrt_qa
> 2018-06-28 

[jira] [Commented] (YARN-8180) Remove yarn.federation.blacklist-subclusters from yarn federation doc

2018-07-02 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530169#comment-16530169
 ] 

Abhishek Modi commented on YARN-8180:
-

[~giovanni.fumarola] Updated the Jira title and description.

> Remove yarn.federation.blacklist-subclusters from yarn federation doc
> -
>
> Key: YARN-8180
> URL: https://issues.apache.org/jira/browse/YARN-8180
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Reporter: Shen Yinjie
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8180.001.patch
>
>
> Property "yarn.federation.blacklist-subclusters" was added in yarn-federation 
> doc by mistake and is not applicable. This Jira is to remove this property 
> from the doc.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8180) Remove yarn.federation.blacklist-subclusters from yarn federation doc

2018-07-02 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8180:

Description: 
Property "yarn.federation.blacklist-subclusters" was added in yarn-federation 
doc by mistake and is not applicable. This Jira is to remove this property from 
the doc.

 

 

  was:
Property "yarn.federation.blacklist-subclusters" is defined in yarn-fedeartion 
doc,but it has not been defined and implemented in Java code.

In FederationClientInterceptor#submitApplication()
{code:java}
List blacklist = new ArrayList();

for (int i = 0; i < numSubmitRetries; ++i) {

SubClusterId subClusterId = policyFacade.getHomeSubcluster(
request.getApplicationSubmissionContext(), blacklist);
{code}
 

 


> Remove yarn.federation.blacklist-subclusters from yarn federation doc
> -
>
> Key: YARN-8180
> URL: https://issues.apache.org/jira/browse/YARN-8180
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Reporter: Shen Yinjie
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8180.001.patch
>
>
> Property "yarn.federation.blacklist-subclusters" was added in yarn-federation 
> doc by mistake and is not applicable. This Jira is to remove this property 
> from the doc.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8180) Remove yarn.federation.blacklist-subclusters from yarn federation doc

2018-07-02 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8180:

Summary: Remove yarn.federation.blacklist-subclusters from yarn federation 
doc  (was: YARN Federation has not implemented blacklist sub-cluster for AM 
routing)

> Remove yarn.federation.blacklist-subclusters from yarn federation doc
> -
>
> Key: YARN-8180
> URL: https://issues.apache.org/jira/browse/YARN-8180
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Reporter: Shen Yinjie
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8180.001.patch
>
>
> Property "yarn.federation.blacklist-subclusters" is defined in 
> yarn-fedeartion doc,but it has not been defined and implemented in Java code.
> In FederationClientInterceptor#submitApplication()
> {code:java}
> List blacklist = new ArrayList();
> for (int i = 0; i < numSubmitRetries; ++i) {
> SubClusterId subClusterId = policyFacade.getHomeSubcluster(
> request.getApplicationSubmissionContext(), blacklist);
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router

2018-07-02 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530033#comment-16530033
 ] 

genericqa commented on YARN-8435:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 54s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
17s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8435 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12929955/YARN-8435.v6.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d3b20c4cd676 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 5d748bd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21162/testReport/ |
| Max. process+thread count | 724 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21162/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> NPE when the same client simultaneously contact for the first time Yarn Router
> 

[jira] [Commented] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router

2018-07-02 Thread rangjiaheng (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529943#comment-16529943
 ] 

rangjiaheng commented on YARN-8435:
---

Thanks [~tanujnay] for testing, it's a very good suggestion, I have fixed it in 
the new patch.

Thanks [~giovanni.fumarola] for review, a new patch YARN-8435.v6.patch solved 
that problem. Along with a similar bug fixed:

A client request found _userPipelineMap_ contains a user key, but get nothing 
out because another client request initialize and expire the first user key.

 

Of course, one suggestion is to set a large yarn.router.pipeline.cache-max-size 
value such as 250, to decrease initialize since LRU expiration, for a large 
hadoop cluster.

 

> NPE when the same client simultaneously contact for the first time Yarn Router
> --
>
> Key: YARN-8435
> URL: https://issues.apache.org/jira/browse/YARN-8435
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: router
>Affects Versions: 2.9.0, 3.0.2
>Reporter: rangjiaheng
>Priority: Critical
> Attachments: YARN-8435.v1.patch, YARN-8435.v2.patch, 
> YARN-8435.v3.patch, YARN-8435.v4.patch, YARN-8435.v5.patch, YARN-8435.v6.patch
>
>
> When Two client process (with the same user name and the same hostname) begin 
> to connect to yarn router at the same time, to submit application, kill 
> application, ... and so on, then a java.lang.NullPointerException may throws 
> from yarn router.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router

2018-07-02 Thread rangjiaheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rangjiaheng updated YARN-8435:
--
Attachment: YARN-8435.v6.patch

> NPE when the same client simultaneously contact for the first time Yarn Router
> --
>
> Key: YARN-8435
> URL: https://issues.apache.org/jira/browse/YARN-8435
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: router
>Affects Versions: 2.9.0, 3.0.2
>Reporter: rangjiaheng
>Priority: Critical
> Attachments: YARN-8435.v1.patch, YARN-8435.v2.patch, 
> YARN-8435.v3.patch, YARN-8435.v4.patch, YARN-8435.v5.patch, YARN-8435.v6.patch
>
>
> When Two client process (with the same user name and the same hostname) begin 
> to connect to yarn router at the same time, to submit application, kill 
> application, ... and so on, then a java.lang.NullPointerException may throws 
> from yarn router.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states

2018-07-02 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529637#comment-16529637
 ] 

Tao Yang commented on YARN-8459:


Hi, [~leftnoteasy]. 
Can we improve the skip queue log in ParentQueue#assignContainers? In 
async-scheduling mode, there are too many such debug logs (thousands every 
second) and may generate several new log files every minute when there is no 
pending request on the root queue. I think this log can be printed at periodic 
intervals.
{code:java}
if (!super.hasPendingResourceRequest(candidates.getPartition(),
clusterResource, schedulingMode)) {
  if (LOG.isDebugEnabled()) {
LOG.debug("Skip this queue=" + getQueuePath()
+ ", because it doesn't need more resource, schedulingMode="
+ schedulingMode.name() + " node-partition=" + candidates
.getPartition());
  }
 ...
   }
{code}

> Improve logs of Capacity Scheduler to better debug invalid states
> -
>
> Key: YARN-8459
> URL: https://issues.apache.org/jira/browse/YARN-8459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8459.001.patch, YARN-8459.002.patch, 
> YARN-8459.003.patch
>
>
> Improve logs in CS to better debug invalid states



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-02 Thread Chen Qingcha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: hadoop-2.9.0-gpu-port.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2-gpu.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0-gpu-port.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org