[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: hadoop_2.9.0.patch > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2-gpu.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0-gpu-port.patch, hadoop_2.9.0.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8489) Need to support customer termination policy for native services
Wangda Tan created YARN-8489: Summary: Need to support customer termination policy for native services Key: YARN-8489 URL: https://issues.apache.org/jira/browse/YARN-8489 Project: Hadoop YARN Issue Type: Task Components: yarn-native-services Reporter: Wangda Tan Existing YARN service support termination policy for different restart policies. For example ALWAYS means service will not be terminated. And NEVER means if all component terminated, service will be terminated. There're some jobs/services need different policy. For example, if Tensorflow master component terminated (regardless of succeed or finished), we need to terminate whole training job regardless or other states of other components. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8489) Need to support customer termination policy for native services
[ https://issues.apache.org/jira/browse/YARN-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530825#comment-16530825 ] Wangda Tan commented on YARN-8489: -- cc: [~gsaha], [~csingh], [~billie.rinaldi], [~eyang] > Need to support customer termination policy for native services > --- > > Key: YARN-8489 > URL: https://issues.apache.org/jira/browse/YARN-8489 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Priority: Major > > Existing YARN service support termination policy for different restart > policies. For example ALWAYS means service will not be terminated. And NEVER > means if all component terminated, service will be terminated. > There're some jobs/services need different policy. For example, if Tensorflow > master component terminated (regardless of succeed or finished), we need to > terminate whole training job regardless or other states of other components. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8488: - Target Version/s: 3.2.0 > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Priority: Major > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530821#comment-16530821 ] Wangda Tan commented on YARN-8488: -- cc: [~gsaha], [~csingh], [~billie.rinaldi], [~eyang] > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Priority: Major > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8488) Need to add "SUCCEED" state to YARN service
Wangda Tan created YARN-8488: Summary: Need to add "SUCCEED" state to YARN service Key: YARN-8488 URL: https://issues.apache.org/jira/browse/YARN-8488 Project: Hadoop YARN Issue Type: Task Reporter: Wangda Tan Existing YARN service has following states: {code} public enum ServiceState { ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, UPGRADING_AUTO_FINALIZE; } {code} Ideally we should add "SUCCEEDED" state in order to support long running applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8488) Need to add "SUCCEED" state to YARN service
[ https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8488: - Component/s: yarn-native-services > Need to add "SUCCEED" state to YARN service > --- > > Key: YARN-8488 > URL: https://issues.apache.org/jira/browse/YARN-8488 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-native-services >Reporter: Wangda Tan >Priority: Major > > Existing YARN service has following states: > {code} > public enum ServiceState { > ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING, > UPGRADING_AUTO_FINALIZE; > } > {code} > Ideally we should add "SUCCEEDED" state in order to support long running > applications like Tensorflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8487) Remove the unused Variable in TestBroadcastAMRMProxyFederationPolicy#testNotifyOfResponseFromUnknownSubCluster
Mukul Kumar Singh created YARN-8487: --- Summary: Remove the unused Variable in TestBroadcastAMRMProxyFederationPolicy#testNotifyOfResponseFromUnknownSubCluster Key: YARN-8487 URL: https://issues.apache.org/jira/browse/YARN-8487 Project: Hadoop YARN Issue Type: Bug Components: amrmproxy Reporter: Mukul Kumar Singh -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7971) add COOKIE when pass through headers in WebAppProxyServlet
[ https://issues.apache.org/jira/browse/YARN-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530774#comment-16530774 ] genericqa commented on YARN-7971: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 35s{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-7971 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12930052/YARN-7971.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a9edaffee52c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7296b64 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21168/testReport/ | | Max. process+thread count | 303 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21168/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT
[jira] [Issue Comment Deleted] (YARN-7971) add COOKIE when pass through headers in WebAppProxyServlet
[ https://issues.apache.org/jira/browse/YARN-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Yunbo updated YARN-7971: Comment: was deleted (was: add the patch) > add COOKIE when pass through headers in WebAppProxyServlet > -- > > Key: YARN-7971 > URL: https://issues.apache.org/jira/browse/YARN-7971 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.4 >Reporter: Fan Yunbo >Priority: Major > Fix For: 2.6.6 > > Attachments: YARN-7971.001.patch > > > I am using Spark on Yarn and I add some authentication filters in spark web > server. > And the filters need to add query string for authentication like > {code:java} > https://RM:8088/proxy/application_xxx_xxx?q1=xxx=xxx... > {code} > The filters will add cookies in headers when the web server respond the > request. > However, the query string need to be added in the URL every time when I > access the web server because the app proxy servlet in Yarn doesn't pass the > cookies in headers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7971) add COOKIE when pass through headers in WebAppProxyServlet
[ https://issues.apache.org/jira/browse/YARN-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Yunbo updated YARN-7971: Attachment: (was: YARN-7971.001.patch) > add COOKIE when pass through headers in WebAppProxyServlet > -- > > Key: YARN-7971 > URL: https://issues.apache.org/jira/browse/YARN-7971 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.6.4 >Reporter: Fan Yunbo >Priority: Major > > I am using Spark on Yarn and I add some authentication filters in spark web > server. > And the filters need to add query string for authentication like > {code:java} > https://RM:8088/proxy/application_xxx_xxx?q1=xxx=xxx... > {code} > The filters will add cookies in headers when the web server respond the > request. > However, the query string need to be added in the URL every time when I > access the web server because the app proxy servlet in Yarn doesn't pass the > cookies in headers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530681#comment-16530681 ] Íñigo Goiri commented on YARN-8193: --- bq. Maybe there's some quick fix for it? I haven't seen a healthy run for branch-2 in months. There was some discussion on this certificate issue but not sure who can fix this. > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 2.9.0, 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8486) yarn.webapp.filter-entity-list-by-user should honor limit filter for TS reader flows api
[ https://issues.apache.org/jira/browse/YARN-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S reassigned YARN-8486: --- Assignee: Rohith Sharma K S > yarn.webapp.filter-entity-list-by-user should honor limit filter for TS > reader flows api > > > Key: YARN-8486 > URL: https://issues.apache.org/jira/browse/YARN-8486 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Charan Hebri >Assignee: Rohith Sharma K S >Priority: Major > > Post YARN-8319, flows restrict entities per user. If limit is applied to the > flows then returned values are inconsistent. Reason is if back end returned > values are 10 and contains no data for user1, then flows api returns empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8486) yarn.webapp.filter-entity-list-by-user should honor limit filter for TS reader flows api
Rohith Sharma K S created YARN-8486: --- Summary: yarn.webapp.filter-entity-list-by-user should honor limit filter for TS reader flows api Key: YARN-8486 URL: https://issues.apache.org/jira/browse/YARN-8486 Project: Hadoop YARN Issue Type: Improvement Reporter: Charan Hebri Post YARN-8319, flows restrict entities per user. If limit is applied to the flows then returned values are inconsistent. Reason is if back end returned values are 10 and contains no data for user1, then flows api returns empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530626#comment-16530626 ] Xiao Liang edited comment on YARN-8193 at 7/3/18 12:39 AM: --- The build failed due to some reason not related to the patch: npm ERR! Error: CERT_UNTRUSTED npm ERR! at SecurePair. (tls.js:1370:32) npm ERR! at SecurePair.EventEmitter.emit (events.js:92:17) npm ERR! at SecurePair.maybeInitFinished (tls.js:982:10) npm ERR! at CleartextStream.read [as _read] (tls.js:469:13) npm ERR! at CleartextStream.Readable.read (_stream_readable.js:320:10) npm ERR! at EncryptedStream.write [as _write] (tls.js:366:25) npm ERR! at doWrite (_stream_writable.js:223:10) npm ERR! at writeOrBuffer (_stream_writable.js:213:5) npm ERR! at EncryptedStream.Writable.write (_stream_writable.js:180:11) npm ERR! at write (_stream_readable.js:583:24) npm ERR! If you need help, you may report this log at: npm ERR! < [http://github.com/isaacs/npm/issues] > npm ERR! or email it to: npm ERR! npm ERR! System Linux 3.13.0-139-generic npm ERR! command "/usr/bin/nodejs" "/usr/bin/npm" "install" "-g" "bower" npm ERR! cwd /root npm ERR! node -v v0.10.25 npm ERR! npm -v 1.3.10 npm ERR! npm ERR! Additional logging details can be found in: npm ERR! /root/npm-debug.log npm ERR! not ok code 0 Maybe there's some quick fix for it? was (Author: surmountian): The build failed due to some reason not related to the patch: npm ERR! Error: CERT_UNTRUSTED npm ERR! at SecurePair. (tls.js:1370:32) npm ERR! at SecurePair.EventEmitter.emit (events.js:92:17) npm ERR! at SecurePair.maybeInitFinished (tls.js:982:10) npm ERR! at CleartextStream.read [as _read] (tls.js:469:13) npm ERR! at CleartextStream.Readable.read (_stream_readable.js:320:10) npm ERR! at EncryptedStream.write [as _write] (tls.js:366:25) npm ERR! at doWrite (_stream_writable.js:223:10) npm ERR! at writeOrBuffer (_stream_writable.js:213:5) npm ERR! at EncryptedStream.Writable.write (_stream_writable.js:180:11) npm ERR! at write (_stream_readable.js:583:24) npm ERR! If you need help, you may report this log at: npm ERR! < [http://github.com/isaacs/npm/issues] > npm ERR! or email it to: npm ERR! npm ERR! System Linux 3.13.0-139-generic npm ERR! command "/usr/bin/nodejs" "/usr/bin/npm" "install" "-g" "bower" npm ERR! cwd /root npm ERR! node -v v0.10.25 npm ERR! npm -v 1.3.10 npm ERR! npm ERR! Additional logging details can be found in: npm ERR! /root/npm-debug.log npm ERR! not ok code 0 > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 2.9.0, 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530626#comment-16530626 ] Xiao Liang commented on YARN-8193: -- The build failed due to some reason not related to the patch: npm ERR! Error: CERT_UNTRUSTED npm ERR! at SecurePair. (tls.js:1370:32) npm ERR! at SecurePair.EventEmitter.emit (events.js:92:17) npm ERR! at SecurePair.maybeInitFinished (tls.js:982:10) npm ERR! at CleartextStream.read [as _read] (tls.js:469:13) npm ERR! at CleartextStream.Readable.read (_stream_readable.js:320:10) npm ERR! at EncryptedStream.write [as _write] (tls.js:366:25) npm ERR! at doWrite (_stream_writable.js:223:10) npm ERR! at writeOrBuffer (_stream_writable.js:213:5) npm ERR! at EncryptedStream.Writable.write (_stream_writable.js:180:11) npm ERR! at write (_stream_readable.js:583:24) npm ERR! If you need help, you may report this log at: npm ERR! < [http://github.com/isaacs/npm/issues] > npm ERR! or email it to: npm ERR! npm ERR! System Linux 3.13.0-139-generic npm ERR! command "/usr/bin/nodejs" "/usr/bin/npm" "install" "-g" "bower" npm ERR! cwd /root npm ERR! node -v v0.10.25 npm ERR! npm -v 1.3.10 npm ERR! npm ERR! Additional logging details can be found in: npm ERR! /root/npm-debug.log npm ERR! not ok code 0 > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 2.9.0, 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8415) TimelineWebServices.getEntity should throw ForbiddenException instead of 404 when ACL checks fail
[ https://issues.apache.org/jira/browse/YARN-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530554#comment-16530554 ] Hudson commented on YARN-8415: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14515 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14515/]) YARN-8415. TimelineWebServices.getEntity should throw ForbiddenException (sunilg: rev fa9ef15ecd6dc30fb260e1c342a2b51505d39b6b) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/TimelineWebServices.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/RollingLevelDBTimelineStore.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java > TimelineWebServices.getEntity should throw ForbiddenException instead of 404 > when ACL checks fail > - > > Key: YARN-8415 > URL: https://issues.apache.org/jira/browse/YARN-8415 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Suma Shivaprasad >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8415.1.patch, YARN-8415.2.patch, YARN-8415.3.patch > > > {noformat} > private TimelineEntity doGetEntity( > String entityType, > String entityId, > EnumSet fields, > UserGroupInformation callerUGI) throws YarnException, IOException { > TimelineEntity entity = null; > entity = > store.getEntity(entityId, entityType, fields); > if (entity != null) { > addDefaultDomainIdIfAbsent(entity); > // check ACLs > if (!timelineACLsManager.checkAccess( > callerUGI, ApplicationAccessType.VIEW_APP, entity)) { > entity = null; //Should differentiate from an entity get failure > vs ACL check failure here by throwing an Exception.* > } > } > return entity; > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530523#comment-16530523 ] Hudson commented on YARN-8485: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14514 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14514/]) YARN-8485. Priviledged container app launch is failing intermittently. (skumpf: rev 53e267fa7232add3c21174382d91b2607aa6becf) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.2.0, 3.1.1 > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user:
[jira] [Commented] (YARN-6672) Add NM preemption of opportunistic containers when utilization goes high
[ https://issues.apache.org/jira/browse/YARN-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530517#comment-16530517 ] Haibo Chen commented on YARN-6672: -- Thanks [~elgoiri] for the extensive reviews, and [~szegedim] for additional review. I will commit it to YARN-1011 branch shortly. > Add NM preemption of opportunistic containers when utilization goes high > > > Key: YARN-6672 > URL: https://issues.apache.org/jira/browse/YARN-6672 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-6672-YARN-1011.00.patch, > YARN-6672-YARN-1011.01.patch, YARN-6672-YARN-1011.02.patch, > YARN-6672-YARN-1011.03.patch, YARN-6672-YARN-1011.04.patch, > YARN-6672-YARN-1011.05.patch, YARN-6672-YARN-1011.06.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530514#comment-16530514 ] Eric Yang commented on YARN-8485: - Thank you [~shaneku...@gmail.com] for the review and commit. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.2.0, 3.1.1 > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-06-28
[jira] [Updated] (YARN-8415) TimelineWebServices.getEntity should throw ForbiddenException instead of 404 when ACL checks fail
[ https://issues.apache.org/jira/browse/YARN-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8415: - Summary: TimelineWebServices.getEntity should throw ForbiddenException instead of 404 when ACL checks fail (was: TimelineWebServices.getEntity should throw a ForbiddenException(403) instead of 404 when ACL checks fail) > TimelineWebServices.getEntity should throw ForbiddenException instead of 404 > when ACL checks fail > - > > Key: YARN-8415 > URL: https://issues.apache.org/jira/browse/YARN-8415 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8415.1.patch, YARN-8415.2.patch, YARN-8415.3.patch > > > {noformat} > private TimelineEntity doGetEntity( > String entityType, > String entityId, > EnumSet fields, > UserGroupInformation callerUGI) throws YarnException, IOException { > TimelineEntity entity = null; > entity = > store.getEntity(entityId, entityType, fields); > if (entity != null) { > addDefaultDomainIdIfAbsent(entity); > // check ACLs > if (!timelineACLsManager.checkAccess( > callerUGI, ApplicationAccessType.VIEW_APP, entity)) { > entity = null; //Should differentiate from an entity get failure > vs ACL check failure here by throwing an Exception.* > } > } > return entity; > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530510#comment-16530510 ] Shane Kumpf commented on YARN-8485: --- Thanks to [~yeshavora] for reporting this, [~eyang] for the contribution, and [~gsaha] for the review! I committed this to trunk and branch-3.1. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.2.0, 3.1.1 > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor >
[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-8485: -- Issue Type: Sub-task (was: Bug) Parent: YARN-3611 > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.2.0, 3.1.1 > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor >
[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-8485: -- Affects Version/s: 3.1.1 3.2.0 Target Version/s: 3.2.0, 3.1.1 Labels: Docker (was: ) > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.2.0, 3.1.1 > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-06-28
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530487#comment-16530487 ] genericqa commented on YARN-8485: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 46m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 60m 53s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 41s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}109m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8485 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/1293/YARN-8485.002.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 156dcaa2cc8d 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ab2f834 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21167/testReport/ | | Max. process+thread count | 334 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21167/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar >
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530482#comment-16530482 ] Shane Kumpf commented on YARN-8485: --- {code}by checking /usr/bin/sudo is good enough{code} I agree this should be enough for now and is the least risky change. We can open a follow on effort to make this configurable if we find an operating system where this is needed. +1 on the latest patch, pending pre-commit. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' >
[jira] [Comment Edited] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530482#comment-16530482 ] Shane Kumpf edited comment on YARN-8485 at 7/2/18 9:58 PM: --- {quote}by checking /usr/bin/sudo is good enough{quote} I agree this should be enough for now and is the least risky change. We can open a follow on effort to make this configurable if we find an operating system where this is needed. +1 on the latest patch, pending pre-commit. was (Author: shaneku...@gmail.com): {code}by checking /usr/bin/sudo is good enough{code} I agree this should be enough for now and is the least risky change. We can open a follow on effort to make this configurable if we find an operating system where this is needed. +1 on the latest patch, pending pre-commit. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28
[jira] [Commented] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router
[ https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530467#comment-16530467 ] Tanuj Nayak commented on YARN-8435: --- This looks good [~NeoMatrix] and also seems to still functionally work. > NPE when the same client simultaneously contact for the first time Yarn Router > -- > > Key: YARN-8435 > URL: https://issues.apache.org/jira/browse/YARN-8435 > Project: Hadoop YARN > Issue Type: Bug > Components: router >Affects Versions: 2.9.0, 3.0.2 >Reporter: rangjiaheng >Priority: Critical > Attachments: YARN-8435.v1.patch, YARN-8435.v2.patch, > YARN-8435.v3.patch, YARN-8435.v4.patch, YARN-8435.v5.patch, YARN-8435.v6.patch > > > When Two client process (with the same user name and the same hostname) begin > to connect to yarn router at the same time, to submit application, kill > application, ... and so on, then a java.lang.NullPointerException may throws > from yarn router. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8155) Improve ATSv2 client logging in RM and NM publisher
[ https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530457#comment-16530457 ] Rohith Sharma K S commented on YARN-8155: - [~abmodi] it looks branch-2 patch is not running in jenkins rather it is running with trunk patch. May be need to change branch-2 patch format. > Improve ATSv2 client logging in RM and NM publisher > --- > > Key: YARN-8155 > URL: https://issues.apache.org/jira/browse/YARN-8155 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Rohith Sharma K S >Assignee: Abhishek Modi >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: YARN-8155-branch-2.v1.patch, YARN-8155.001.patch, > YARN-8155.002.patch, YARN-8155.003.patch, YARN-8155.004.patch, > YARN-8155.005.patch, YARN-8155.006.patch > > > We see that NM logs are filled with larger stack trace of NotFoundException > if collector is removed from one of the NM and other NMs are still publishing > the entities. > > This Jira is to improve the logging in NM so that we log with informative > message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530442#comment-16530442 ] Eric Yang commented on YARN-8485: - [~gsaha], by checking /usr/bin/sudo is good enough. /bin/sudo was a side effect that RHEL7+ distro have /bin being a symlink to /usr/bin. Therefore, /usr/bin/sudo is the right place to check. It works in RHEL family as well as Debian family. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' > 2018-06-28 21:21:15,668 INFO
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530428#comment-16530428 ] Gour Saha commented on YARN-8485: - bq. This would ensure we don't accidentally call a rogue sudo command I actually agree to this, since a rogue user could add any rogue sudo script to the PATH and pass this check. +1 to the get_docker_binary style OR explicitly checking both /bin/sudo and /usr/bin/sudo to keep the patch simple for now. We should fail if both the paths fail. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error
[jira] [Commented] (YARN-8155) Improve ATSv2 client logging in RM and NM publisher
[ https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530421#comment-16530421 ] genericqa commented on YARN-8155: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 30s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 70m 49s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8155 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12928076/YARN-8155.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 484a069e4f99 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ab2f834 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21166/testReport/ | | Max. process+thread count | 425 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21166/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT
[jira] [Commented] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states
[ https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530411#comment-16530411 ] Sunil Govindan commented on YARN-8459: -- Thanks [~leftnoteasy]. Latest change seems fine. It covers the case mentioned by [~Tao Yang] I will commit this patch by end of day if no other objections. > Improve logs of Capacity Scheduler to better debug invalid states > - > > Key: YARN-8459 > URL: https://issues.apache.org/jira/browse/YARN-8459 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-8459.001.patch, YARN-8459.002.patch, > YARN-8459.003.patch, YARN-8459.004.patch > > > Improve logs in CS to better debug invalid states -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530397#comment-16530397 ] Eric Yang commented on YARN-8485: - [~shaneku...@gmail.com] thank you for the review. Rogue sudo could be a real threat with the relaxed security on patch 001. It looks like most Linux distro have agreed on using /usr/bin/sudo path for sudo binary. It is probably safer to use the standard path than introducing another config late in 3.1.1 release. Hence, patch 002 provides the required fix without compromise security. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error
[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8485: Attachment: YARN-8485.002.patch > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-06-28 21:21:15,669 INFO
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530377#comment-16530377 ] Shane Kumpf commented on YARN-8485: --- Thanks for the patch [~eyang]! This did work for my test, but I'm thinking if we might want to treat sudo similar to {{get_docker_binary()}} in {{docker-util.c}}. This would ensure we don't accidentally call a rogue sudo command since it would be set in {{container-executor.cfg}}. Also, I noticed CentOS 7 does have {{/usr/bin/sudo }}(which is also what Debian uses), so that might be a good fallback if the user hasn't set the sudo binary path in {{container-executor.cfg}}, but I don't have a strong preference there. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user:
[jira] [Commented] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states
[ https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530366#comment-16530366 ] genericqa commented on YARN-8459: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 55s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}124m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8459 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12929980/YARN-8459.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ed01fc5eb0c8 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1804a31 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21163/testReport/ | | Max. process+thread count | 920 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21163/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT
[jira] [Commented] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state
[ https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530340#comment-16530340 ] genericqa commented on YARN-8473: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 8s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 0s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.application.TestApplication | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8473 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12929983/YARN-8473.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux bc21c589a317 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1804a31 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21164/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21164/testReport/ | | Max. process+thread count | 334 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530330#comment-16530330 ] genericqa commented on YARN-8485: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 38m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 41s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8485 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12929986/YARN-8485.001.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 7bf3fe745bd4 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1804a31 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21165/testReport/ | | Max. process+thread count | 336 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21165/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar >
[jira] [Commented] (YARN-8155) Improve ATSv2 client logging in RM and NM publisher
[ https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530315#comment-16530315 ] Rohith Sharma K S commented on YARN-8155: - Kicked in jenkin for rerun again.. I will commit it once jenkins report result. > Improve ATSv2 client logging in RM and NM publisher > --- > > Key: YARN-8155 > URL: https://issues.apache.org/jira/browse/YARN-8155 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Rohith Sharma K S >Assignee: Abhishek Modi >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: YARN-8155-branch-2.v1.patch, YARN-8155.001.patch, > YARN-8155.002.patch, YARN-8155.003.patch, YARN-8155.004.patch, > YARN-8155.005.patch, YARN-8155.006.patch > > > We see that NM logs are filled with larger stack trace of NotFoundException > if collector is removed from one of the NM and other NMs are still publishing > the entities. > > This Jira is to improve the logging in NM so that we log with informative > message. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8465) Dshell docker container gets marked as lost after NM restart
[ https://issues.apache.org/jira/browse/YARN-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530249#comment-16530249 ] Hudson commented on YARN-8465: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14511 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14511/]) YARN-8465. Fixed docker container status for node manager restart. (eyang: rev 5cc2541a163591181b80bf2ec42c1e7e7f8929f5) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerExecutionException.java > Dshell docker container gets marked as lost after NM restart > > > Key: YARN-8465 > URL: https://issues.apache.org/jira/browse/YARN-8465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.2.0, 3.1.1 >Reporter: Yesha Vora >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8465.001.patch > > > scenario: > 1) launch dshell application > {code} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar > -shell_command "sleep 500" -num_containers 2 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xx/httpd:0.1 -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar{code} > 2) wait for app to be in stable state ( > container_e01_1529968198450_0001_01_02 is running on host7 and > container_e01_1529968198450_0001_01_03 is running on host5) > 3) restart NM (host7) > Here, dshell application fails with below error > {code}18/06/25 23:35:30 INFO distributedshell.Client: Got application report > from ASM for, appId=1, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, > service: }, appDiagnostics=, appMasterHost=host9/xxx, appQueue=default, > appMasterRpcPort=-1, appStartTime=1529969211776, yarnAppState=RUNNING, > distributedFinalState=UNDEFINED, > appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, > appUser=hbase > 18/06/25 23:35:31 INFO distributedshell.Client: Got application report from > ASM for, appId=1, clientToAMToken=null, appDiagnostics=Application Failure: > desired = 2, completed = 2, allocated = 2, failed = 1, diagnostics = > [2018-06-25 23:35:28.000]Container exited with a non-zero exit code 154 > [2018-06-25 23:35:28.001]Container exited with a non-zero exit code 154 > , appMasterHost=host9/xxx, appQueue=default, appMasterRpcPort=-1, > appStartTime=1529969211776, yarnAppState=FINISHED, > distributedFinalState=FAILED, > appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, > appUser=hbase > 18/06/25 23:35:31 INFO distributedshell.Client: Application did finished > unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring > loop > 18/06/25 23:35:31 ERROR distributedshell.Client: Application failed to > complete successfully{code} > Here, the docker container marked as LOST after completion > {code} > 2018-06-25 23:35:27,970 WARN runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:signalContainer(1034)) - Signal docker > container failed. Exception: > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Liveliness check failed for PID: 423695. Container may have already > completed. > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.executeLivelinessCheck(DockerLinuxContainerRuntime.java:1208) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:1026) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:159) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:755) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:905) > at >
[jira] [Assigned] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-8485: --- Assignee: Eric Yang Convert sudo check to base on path variable. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-06-28
[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8485: Attachment: YARN-8485.001.patch > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Priority: Major > Attachments: YARN-8485.001.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor >
[jira] [Updated] (YARN-8473) Containers being launched as app tears down can leave containers in NEW state
[ https://issues.apache.org/jira/browse/YARN-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8473: - Attachment: YARN-8473.001.patch > Containers being launched as app tears down can leave containers in NEW state > - > > Key: YARN-8473 > URL: https://issues.apache.org/jira/browse/YARN-8473 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.4 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Major > Attachments: YARN-8473.001.patch > > > I saw a case where containers were stuck on a nodemanager in the NEW state > because they tried to launch just as an application was tearing down. The > container sent an INIT_CONTAINER event to the ApplicationImpl which then > executed an invalid transition since that event is not handled/expected when > the application is in the process of tearing down. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states
[ https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530233#comment-16530233 ] Wangda Tan commented on YARN-8459: -- Attached patch (004) which moved re-reservation to debug log, and addressed comments from [~bibinchundatt] / [~Tao Yang]. Please review and let me know your thoughts. > Improve logs of Capacity Scheduler to better debug invalid states > - > > Key: YARN-8459 > URL: https://issues.apache.org/jira/browse/YARN-8459 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-8459.001.patch, YARN-8459.002.patch, > YARN-8459.003.patch, YARN-8459.004.patch > > > Improve logs in CS to better debug invalid states -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states
[ https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8459: - Attachment: YARN-8459.004.patch > Improve logs of Capacity Scheduler to better debug invalid states > - > > Key: YARN-8459 > URL: https://issues.apache.org/jira/browse/YARN-8459 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-8459.001.patch, YARN-8459.002.patch, > YARN-8459.003.patch, YARN-8459.004.patch > > > Improve logs in CS to better debug invalid states -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8465) Dshell docker container gets marked as lost after NM restart
[ https://issues.apache.org/jira/browse/YARN-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530231#comment-16530231 ] Shane Kumpf commented on YARN-8465: --- Thanks [~eyang]! > Dshell docker container gets marked as lost after NM restart > > > Key: YARN-8465 > URL: https://issues.apache.org/jira/browse/YARN-8465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.2.0, 3.1.1 >Reporter: Yesha Vora >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8465.001.patch > > > scenario: > 1) launch dshell application > {code} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar > -shell_command "sleep 500" -num_containers 2 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xx/httpd:0.1 -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar{code} > 2) wait for app to be in stable state ( > container_e01_1529968198450_0001_01_02 is running on host7 and > container_e01_1529968198450_0001_01_03 is running on host5) > 3) restart NM (host7) > Here, dshell application fails with below error > {code}18/06/25 23:35:30 INFO distributedshell.Client: Got application report > from ASM for, appId=1, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, > service: }, appDiagnostics=, appMasterHost=host9/xxx, appQueue=default, > appMasterRpcPort=-1, appStartTime=1529969211776, yarnAppState=RUNNING, > distributedFinalState=UNDEFINED, > appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, > appUser=hbase > 18/06/25 23:35:31 INFO distributedshell.Client: Got application report from > ASM for, appId=1, clientToAMToken=null, appDiagnostics=Application Failure: > desired = 2, completed = 2, allocated = 2, failed = 1, diagnostics = > [2018-06-25 23:35:28.000]Container exited with a non-zero exit code 154 > [2018-06-25 23:35:28.001]Container exited with a non-zero exit code 154 > , appMasterHost=host9/xxx, appQueue=default, appMasterRpcPort=-1, > appStartTime=1529969211776, yarnAppState=FINISHED, > distributedFinalState=FAILED, > appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, > appUser=hbase > 18/06/25 23:35:31 INFO distributedshell.Client: Application did finished > unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring > loop > 18/06/25 23:35:31 ERROR distributedshell.Client: Application failed to > complete successfully{code} > Here, the docker container marked as LOST after completion > {code} > 2018-06-25 23:35:27,970 WARN runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:signalContainer(1034)) - Signal docker > container failed. Exception: > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Liveliness check failed for PID: 423695. Container may have already > completed. > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.executeLivelinessCheck(DockerLinuxContainerRuntime.java:1208) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:1026) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:159) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:755) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:905) > at > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:284) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:721) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-06-25 23:35:27,975 WARN nodemanager.LinuxContainerExecutor >
[jira] [Commented] (YARN-8465) Dshell docker container gets marked as lost after NM restart
[ https://issues.apache.org/jira/browse/YARN-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530221#comment-16530221 ] Eric Yang commented on YARN-8465: - +1 works on my test cluster. I will commit this shortly. > Dshell docker container gets marked as lost after NM restart > > > Key: YARN-8465 > URL: https://issues.apache.org/jira/browse/YARN-8465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.2.0, 3.1.1 >Reporter: Yesha Vora >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8465.001.patch > > > scenario: > 1) launch dshell application > {code} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar > -shell_command "sleep 500" -num_containers 2 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xx/httpd:0.1 -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar{code} > 2) wait for app to be in stable state ( > container_e01_1529968198450_0001_01_02 is running on host7 and > container_e01_1529968198450_0001_01_03 is running on host5) > 3) restart NM (host7) > Here, dshell application fails with below error > {code}18/06/25 23:35:30 INFO distributedshell.Client: Got application report > from ASM for, appId=1, clientToAMToken=Token { kind: YARN_CLIENT_TOKEN, > service: }, appDiagnostics=, appMasterHost=host9/xxx, appQueue=default, > appMasterRpcPort=-1, appStartTime=1529969211776, yarnAppState=RUNNING, > distributedFinalState=UNDEFINED, > appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, > appUser=hbase > 18/06/25 23:35:31 INFO distributedshell.Client: Got application report from > ASM for, appId=1, clientToAMToken=null, appDiagnostics=Application Failure: > desired = 2, completed = 2, allocated = 2, failed = 1, diagnostics = > [2018-06-25 23:35:28.000]Container exited with a non-zero exit code 154 > [2018-06-25 23:35:28.001]Container exited with a non-zero exit code 154 > , appMasterHost=host9/xxx, appQueue=default, appMasterRpcPort=-1, > appStartTime=1529969211776, yarnAppState=FINISHED, > distributedFinalState=FAILED, > appTrackingUrl=https://host4:8090/proxy/application_1529968198450_0001/, > appUser=hbase > 18/06/25 23:35:31 INFO distributedshell.Client: Application did finished > unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring > loop > 18/06/25 23:35:31 ERROR distributedshell.Client: Application failed to > complete successfully{code} > Here, the docker container marked as LOST after completion > {code} > 2018-06-25 23:35:27,970 WARN runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:signalContainer(1034)) - Signal docker > container failed. Exception: > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Liveliness check failed for PID: 423695. Container may have already > completed. > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.executeLivelinessCheck(DockerLinuxContainerRuntime.java:1208) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:1026) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:159) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:755) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:905) > at > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:284) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:721) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-06-25 23:35:27,975 WARN nodemanager.LinuxContainerExecutor >
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530218#comment-16530218 ] Eric Yang commented on YARN-8485: - sudo binary path for Debian is different from Redhat/CentOS that caused the sudo check to fail. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Priority: Major > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-06-28 21:21:15,669 INFO
[jira] [Updated] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8485: Environment: Debian > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Priority: Major > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > docker error code=11, error message='Privileged containers are disabled' > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - > 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell output: main : command > provided 4 > 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - main : run as user is hrt_qa > 2018-06-28
[jira] [Commented] (YARN-8180) Remove yarn.federation.blacklist-subclusters from yarn federation doc
[ https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530169#comment-16530169 ] Abhishek Modi commented on YARN-8180: - [~giovanni.fumarola] Updated the Jira title and description. > Remove yarn.federation.blacklist-subclusters from yarn federation doc > - > > Key: YARN-8180 > URL: https://issues.apache.org/jira/browse/YARN-8180 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Reporter: Shen Yinjie >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8180.001.patch > > > Property "yarn.federation.blacklist-subclusters" was added in yarn-federation > doc by mistake and is not applicable. This Jira is to remove this property > from the doc. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8180) Remove yarn.federation.blacklist-subclusters from yarn federation doc
[ https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-8180: Description: Property "yarn.federation.blacklist-subclusters" was added in yarn-federation doc by mistake and is not applicable. This Jira is to remove this property from the doc. was: Property "yarn.federation.blacklist-subclusters" is defined in yarn-fedeartion doc,but it has not been defined and implemented in Java code. In FederationClientInterceptor#submitApplication() {code:java} List blacklist = new ArrayList(); for (int i = 0; i < numSubmitRetries; ++i) { SubClusterId subClusterId = policyFacade.getHomeSubcluster( request.getApplicationSubmissionContext(), blacklist); {code} > Remove yarn.federation.blacklist-subclusters from yarn federation doc > - > > Key: YARN-8180 > URL: https://issues.apache.org/jira/browse/YARN-8180 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Reporter: Shen Yinjie >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8180.001.patch > > > Property "yarn.federation.blacklist-subclusters" was added in yarn-federation > doc by mistake and is not applicable. This Jira is to remove this property > from the doc. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8180) Remove yarn.federation.blacklist-subclusters from yarn federation doc
[ https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-8180: Summary: Remove yarn.federation.blacklist-subclusters from yarn federation doc (was: YARN Federation has not implemented blacklist sub-cluster for AM routing) > Remove yarn.federation.blacklist-subclusters from yarn federation doc > - > > Key: YARN-8180 > URL: https://issues.apache.org/jira/browse/YARN-8180 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation >Reporter: Shen Yinjie >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8180.001.patch > > > Property "yarn.federation.blacklist-subclusters" is defined in > yarn-fedeartion doc,but it has not been defined and implemented in Java code. > In FederationClientInterceptor#submitApplication() > {code:java} > List blacklist = new ArrayList(); > for (int i = 0; i < numSubmitRetries; ++i) { > SubClusterId subClusterId = policyFacade.getHomeSubcluster( > request.getApplicationSubmissionContext(), blacklist); > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router
[ https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530033#comment-16530033 ] genericqa commented on YARN-8435: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 17s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8435 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12929955/YARN-8435.v6.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d3b20c4cd676 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5d748bd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21162/testReport/ | | Max. process+thread count | 724 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21162/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > NPE when the same client simultaneously contact for the first time Yarn Router >
[jira] [Commented] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router
[ https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529943#comment-16529943 ] rangjiaheng commented on YARN-8435: --- Thanks [~tanujnay] for testing, it's a very good suggestion, I have fixed it in the new patch. Thanks [~giovanni.fumarola] for review, a new patch YARN-8435.v6.patch solved that problem. Along with a similar bug fixed: A client request found _userPipelineMap_ contains a user key, but get nothing out because another client request initialize and expire the first user key. Of course, one suggestion is to set a large yarn.router.pipeline.cache-max-size value such as 250, to decrease initialize since LRU expiration, for a large hadoop cluster. > NPE when the same client simultaneously contact for the first time Yarn Router > -- > > Key: YARN-8435 > URL: https://issues.apache.org/jira/browse/YARN-8435 > Project: Hadoop YARN > Issue Type: Bug > Components: router >Affects Versions: 2.9.0, 3.0.2 >Reporter: rangjiaheng >Priority: Critical > Attachments: YARN-8435.v1.patch, YARN-8435.v2.patch, > YARN-8435.v3.patch, YARN-8435.v4.patch, YARN-8435.v5.patch, YARN-8435.v6.patch > > > When Two client process (with the same user name and the same hostname) begin > to connect to yarn router at the same time, to submit application, kill > application, ... and so on, then a java.lang.NullPointerException may throws > from yarn router. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8435) NPE when the same client simultaneously contact for the first time Yarn Router
[ https://issues.apache.org/jira/browse/YARN-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rangjiaheng updated YARN-8435: -- Attachment: YARN-8435.v6.patch > NPE when the same client simultaneously contact for the first time Yarn Router > -- > > Key: YARN-8435 > URL: https://issues.apache.org/jira/browse/YARN-8435 > Project: Hadoop YARN > Issue Type: Bug > Components: router >Affects Versions: 2.9.0, 3.0.2 >Reporter: rangjiaheng >Priority: Critical > Attachments: YARN-8435.v1.patch, YARN-8435.v2.patch, > YARN-8435.v3.patch, YARN-8435.v4.patch, YARN-8435.v5.patch, YARN-8435.v6.patch > > > When Two client process (with the same user name and the same hostname) begin > to connect to yarn router at the same time, to submit application, kill > application, ... and so on, then a java.lang.NullPointerException may throws > from yarn router. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8459) Improve logs of Capacity Scheduler to better debug invalid states
[ https://issues.apache.org/jira/browse/YARN-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529637#comment-16529637 ] Tao Yang commented on YARN-8459: Hi, [~leftnoteasy]. Can we improve the skip queue log in ParentQueue#assignContainers? In async-scheduling mode, there are too many such debug logs (thousands every second) and may generate several new log files every minute when there is no pending request on the root queue. I think this log can be printed at periodic intervals. {code:java} if (!super.hasPendingResourceRequest(candidates.getPartition(), clusterResource, schedulingMode)) { if (LOG.isDebugEnabled()) { LOG.debug("Skip this queue=" + getQueuePath() + ", because it doesn't need more resource, schedulingMode=" + schedulingMode.name() + " node-partition=" + candidates .getPartition()); } ... } {code} > Improve logs of Capacity Scheduler to better debug invalid states > - > > Key: YARN-8459 > URL: https://issues.apache.org/jira/browse/YARN-8459 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: YARN-8459.001.patch, YARN-8459.002.patch, > YARN-8459.003.patch > > > Improve logs in CS to better debug invalid states -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling
[ https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Qingcha updated YARN-7481: --- Attachment: hadoop-2.9.0-gpu-port.patch > Gpu locality support for Better AI scheduling > - > > Key: YARN-7481 > URL: https://issues.apache.org/jira/browse/YARN-7481 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, RM, yarn >Affects Versions: 2.7.2 >Reporter: Chen Qingcha >Priority: Major > Fix For: 2.7.2 > > Attachments: GPU locality support for Job scheduling.pdf, > hadoop-2.7.2-gpu.patch, hadoop-2.7.2.gpu-port.patch, > hadoop-2.9.0-gpu-port.patch > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > We enhance Hadoop with GPU support for better AI job scheduling. > Currently, YARN-3926 also supports GPU scheduling, which treats GPU as > countable resource. > However, GPU placement is also very important to deep learning job for better > efficiency. > For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu > {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not. > We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which > support fine-grained GPU placement. > A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage > and locality information in a node (up to 64 GPUs per node). '1' means > available and '0' otherwise in the corresponding position of the bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org