[jira] [Updated] (YARN-7818) Remove warning when a container is killed with an expected exit code
[ https://issues.apache.org/jira/browse/YARN-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-7818: -- Description: steps: 1) Run Dshell Application {code:java} yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar /usr/hdp/3.0.0.0-751/hadoop-yarn/hadoop-yarn-applications-distributedshell-*.jar -keep_containers_across_application_attempts -timeout 90 -shell_command "sleep 110" -num_containers 4{code} 2) Find out host where AM is running. 3) Find Containers launched by application 4) Restart NM where AM is running 5) Validate that new attempt is not started and containers launched before restart are in RUNNING state. In this test, step#5 fails because containers failed to launch with error 143 {code:java} 2018-01-24 09:48:30,547 INFO container.ContainerImpl (ContainerImpl.java:handle(2108)) - Container container_e04_1516787230461_0001_01_03 transitioned from RUNNING to KILLING 2018-01-24 09:48:30,547 INFO launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container container_e04_1516787230461_0001_01_03 2018-01-24 09:48:30,552 WARN privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell execution returned exit code: 143. Privileged Execution Operation Stderr: Stdout: main : command provided 1 main : run as user is hrt_qa main : requested yarn user is hrt_qa Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths... Full command array for failed execution: [/usr/hdp/3.0.0.0-751/hadoop-yarn/bin/container-executor, hrt_qa, hrt_qa, 1, application_1516787230461_0001, container_e04_1516787230461_0001_01_03, /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1516787230461_0001/container_e04_1516787230461_0001_01_03, /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/launch_container.sh, /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.tokens, /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.pid, /grid/0/hadoop/yarn/local, /grid/0/hadoop/yarn/log, cgroups=none] 2018-01-24 09:48:30,553 WARN runtime.DefaultLinuxContainerRuntime (DefaultLinuxContainerRuntime.java:launchContainer(127)) - Launch container failed. Exception: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=143: at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:152) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:549) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:465) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:285) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:95) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: ExitCodeException exitCode=143: at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009) at org.apache.hadoop.util.Shell.run(Shell.java:902) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) ... 10 more 2018-01-24 09:48:30,553 WARN nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:launchContainer(557)) - Exit code from container
[jira] [Updated] (YARN-7818) Remove warning when a container is killed with an expected exit code
[ https://issues.apache.org/jira/browse/YARN-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf updated YARN-7818: -- Summary: Remove warning when a container is killed with an expected exit code (was: Remove warning for expected exit code) > Remove warning when a container is killed with an expected exit code > > > Key: YARN-7818 > URL: https://issues.apache.org/jira/browse/YARN-7818 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Shane Kumpf >Priority: Major > > steps: > 1) Run Dshell Application > {code} > yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar > /usr/hdp/3.0.0.0-751/hadoop-yarn/hadoop-yarn-applications-distributedshell-*.jar > -keep_containers_across_application_attempts -timeout 90 -shell_command > "sleep 110" -num_containers 4{code} > 2) Find out host where AM is running. > 3) Find Containers launched by application > 4) Restart NM where AM is running > 5) Validate that new attempt is not started and containers launched before > restart are in RUNNING state. > In this test, step#5 fails because containers failed to launch with error 143 > {code} > 2018-01-24 09:48:30,547 INFO container.ContainerImpl > (ContainerImpl.java:handle(2108)) - Container > container_e04_1516787230461_0001_01_03 transitioned from RUNNING to > KILLING > 2018-01-24 09:48:30,547 INFO launcher.ContainerLaunch > (ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container > container_e04_1516787230461_0001_01_03 > 2018-01-24 09:48:30,552 WARN privileged.PrivilegedOperationExecutor > (PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell > execution returned exit code: 143. Privileged Execution Operation Stderr: > Stdout: main : command provided 1 > main : run as user is hrt_qa > main : requested yarn user is hrt_qa > Getting exit code file... > Creating script paths... > Writing pid file... > Writing to tmp file > /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.pid.tmp > Writing to cgroup task files... > Creating local dirs... > Launching container... > Getting exit code file... > Creating script paths... > Full command array for failed execution: > [/usr/hdp/3.0.0.0-751/hadoop-yarn/bin/container-executor, hrt_qa, hrt_qa, 1, > application_1516787230461_0001, container_e04_1516787230461_0001_01_03, > /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1516787230461_0001/container_e04_1516787230461_0001_01_03, > > /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/launch_container.sh, > > /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.tokens, > > /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_03/container_e04_1516787230461_0001_01_03.pid, > /grid/0/hadoop/yarn/local, /grid/0/hadoop/yarn/log, cgroups=none] > 2018-01-24 09:48:30,553 WARN runtime.DefaultLinuxContainerRuntime > (DefaultLinuxContainerRuntime.java:launchContainer(127)) - Launch container > failed. Exception: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=143: > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:152) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:549) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:465) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:285) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:95) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at