[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528959#comment-16528959 ] genericqa commented on YARN-8193: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 2m 13s{color} | {color:red} Docker failed to build yetus/hadoop:17213a0. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8193 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12929803/YARN-8193-branch-2.9.0-001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/21160/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 2.9.0, 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528957#comment-16528957 ] Wangda Tan commented on YARN-8193: -- [~elgoiri], Jenkins will be triggered after patch submitted. > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 2.9.0, 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8302) ATS v2 should handle HBase connection issue properly
[ https://issues.apache.org/jira/browse/YARN-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528937#comment-16528937 ] Billie Rinaldi commented on YARN-8302: -- These unit tests appear to be broken (not by this patch) unless using hbase.profile=2.0. > ATS v2 should handle HBase connection issue properly > > > Key: YARN-8302 > URL: https://issues.apache.org/jira/browse/YARN-8302 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Billie Rinaldi >Priority: Major > Attachments: YARN-8302.1.patch > > > ATS v2 call times out with below error when it can't connect to HBase > instance. > {code} > bash-4.2$ curl -i -k -s -1 -H 'Content-Type: application/json' -H 'Accept: > application/json' --max-time 5 --negotiate -u : > 'https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/YARN_CONTAINER?fields=ALL&_=1526425686092' > curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received > {code} > {code:title=ATS log} > 2018-05-15 23:10:03,623 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=7, > retries=7, started=8165 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1 > 2018-05-15 23:10:13,651 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=8, > retries=8, started=18192 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1 > 2018-05-15 23:10:23,730 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=9, > retries=9, started=28272 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1 > 2018-05-15 23:10:33,788 INFO client.RpcRetryingCallerImpl > (RpcRetryingCallerImpl.java:callWithRetries(134)) - Call exception, tries=10, > retries=10, started=38330 ms ago, cancelled=false, msg=Call to xxx/xxx:17020 > failed on connection exception: > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: xxx/xxx:17020, details=row > 'prod.timelineservice.app_flow, > ,99' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, > hostname=xxx,17020,1526348294182, seqNum=-1{code} > There are two issues here. > 1) Check why ATS can't connect to HBase > 2) In case of connection error, ATS call should not get timeout. It should > fail with proper error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8485) Priviledged container app launch is failing intermittently
Yesha Vora created YARN-8485: Summary: Priviledged container app launch is failing intermittently Key: YARN-8485 URL: https://issues.apache.org/jira/browse/YARN-8485 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Reporter: Yesha Vora Privileged application fails intermittently {code:java} yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar -shell_command "sleep 30" -num_containers 1 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} Here, container launch fails with 'Privileged containers are disabled' even though Docker privilege container is enabled in the cluster {code:java|title=nm log} 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - All checks pass. Launching privileged container for : container_e01_1530220647587_0001_01_02 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container container_e01_1530220647587_0001_01_02 is : 29 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from container-launch with container ID: container_e01_1530220647587_0001_01_02 and exit code: 29 org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Container id: container_e01_1530220647587_0001_01_02 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exit code: 29 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container failed 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Shell error output: check privileges failed for user: hrt_qa, error code: 0 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled for user: hrt_qa 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, docker error code=11, error message='Privileged containers are disabled' 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Shell output: main : command provided 4 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - main : run as user is hrt_qa 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - main : requested yarn user is hrt_qa 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Creating script paths... 2018-06-28 21:21:15,669 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(541)) - Creating local dirs... 2018-06-28
[jira] [Commented] (YARN-6672) Add NM preemption of opportunistic containers when utilization goes high
[ https://issues.apache.org/jira/browse/YARN-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528834#comment-16528834 ] Miklos Szegedi commented on YARN-6672: -- I checked and I would like to add an additional +1 on the latest patch.Thank you for the contribution [~haibochen] and [~elgoiri] for the review. > Add NM preemption of opportunistic containers when utilization goes high > > > Key: YARN-6672 > URL: https://issues.apache.org/jira/browse/YARN-6672 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > Attachments: YARN-6672-YARN-1011.00.patch, > YARN-6672-YARN-1011.01.patch, YARN-6672-YARN-1011.02.patch, > YARN-6672-YARN-1011.03.patch, YARN-6672-YARN-1011.04.patch, > YARN-6672-YARN-1011.05.patch, YARN-6672-YARN-1011.06.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528824#comment-16528824 ] Íñigo Goiri commented on YARN-8193: --- I don't think Yetus will do a very good job running the unit tests for [^YARN-8193-branch-2.9.0-001.patch] so no point on waiting for it. [^YARN-8193-branch-2.9.0-001.patch] looks pretty much the same as [^YARN-8193.002.patch] but using the allocator. +1 > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 2.9.0, 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5748) Backport YARN-5718 to branch-2
[ https://issues.apache.org/jira/browse/YARN-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528763#comment-16528763 ] genericqa commented on YARN-5748: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 44s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 58s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 3s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 21s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 14s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 7s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 269 unchanged - 0 fixed = 271 total (was 269) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 41s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 42s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 64m 14s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 41s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}203m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:f667ef1 | | JIRA Issue | YARN-5748 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861866/YARN-5748-branch-2.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 4c5fdee32f43 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / f951d92 | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_181 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/21159/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt | | unit |
[jira] [Commented] (YARN-8434) Nodemanager not registering to active RM in federation
[ https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528727#comment-16528727 ] Bibin A Chundatt commented on YARN-8434: [~goirix] Could you help in review ??? > Nodemanager not registering to active RM in federation > -- > > Key: YARN-8434 > URL: https://issues.apache.org/jira/browse/YARN-8434 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: YARN-8434.001.patch, YARN-8434.002.patch > > > FederationRMFailoverProxyProvider doesn't handle connecting to active RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5748) Backport YARN-5718 to branch-2
[ https://issues.apache.org/jira/browse/YARN-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528660#comment-16528660 ] He Xiaoqiao commented on YARN-5748: --- [~djp],[~iwasakims] is this issue still going work? > Backport YARN-5718 to branch-2 > -- > > Key: YARN-5748 > URL: https://issues.apache.org/jira/browse/YARN-5748 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Masatake Iwasaki >Priority: Major > Attachments: YARN-5748-branch-2.001.patch, > YARN-5748-branch-2.002.patch > > > In YARN-5718, we have identify several unnecessary config to over-write HDFS > client behavior in several components of YARN (FSRMStore, TimelineClient, > NodeLabelStore, etc.) which cause job failure in some cases (NN HA, etc.) - > that's definitely belongs to bug. In YARN-5718, we proposed to remove the > config as it shouldn't be supposed to work, which get committed to trunk > already as alpha stage has more flexibility for incompatible changes. In > branch-2, we want to play a bit more safe and get more discussion. > Obviously, there are several options here: > 1. Don't fix anything, let bug exist > 2. Fix the bug, but keep the configuration, or mark it deprecated and add > some explanation to say this configuration is not supposed to work any more. > 3. Exactly like YARN-5718, fix the bug and remove the unnecessary > configuration. > This ticket is filed for more discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org