[ 
https://issues.apache.org/jira/browse/YARN-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-7818:
------------------------------
    Description: 
steps:
 1) Run Dshell Application
{code:java}
yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
/usr/hdp/3.0.0.0-751/hadoop-yarn/hadoop-yarn-applications-distributedshell-*.jar
 -keep_containers_across_application_attempts -timeout 900000 -shell_command 
"sleep 110" -num_containers 4{code}
2) Find out host where AM is running. 
 3) Find Containers launched by application
 4) Restart NM where AM is running
 5) Validate that new attempt is not started and containers launched before 
restart are in RUNNING state.

In this test, step#5 fails because containers failed to launch with error 143
{code:java}
2018-01-24 09:48:30,547 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1516787230461_0001_01_000003 transitioned from RUNNING to KILLING
2018-01-24 09:48:30,547 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container 
container_e04_1516787230461_0001_01_000003
2018-01-24 09:48:30,552 WARN  privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell 
execution returned exit code: 143. Privileged Execution Operation Stderr:

Stdout: main : command provided 1
main : run as user is hrt_qa
main : requested yarn user is hrt_qa
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/container_e04_1516787230461_0001_01_000003.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...

Full command array for failed execution:
[/usr/hdp/3.0.0.0-751/hadoop-yarn/bin/container-executor, hrt_qa, hrt_qa, 1, 
application_1516787230461_0001, container_e04_1516787230461_0001_01_000003, 
/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003,
 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/launch_container.sh,
 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/container_e04_1516787230461_0001_01_000003.tokens,
 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/container_e04_1516787230461_0001_01_000003.pid,
 /grid/0/hadoop/yarn/local, /grid/0/hadoop/yarn/log, cgroups=none]
2018-01-24 09:48:30,553 WARN  runtime.DefaultLinuxContainerRuntime 
(DefaultLinuxContainerRuntime.java:launchContainer(127)) - Launch container 
failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=143:
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:152)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:549)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:465)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:285)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:95)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=143:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
        at org.apache.hadoop.util.Shell.run(Shell.java:902)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
        ... 10 more
2018-01-24 09:48:30,553 WARN  nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:launchContainer(557)) - Exit code from container 
container_e04_1516787230461_0001_01_000003 is : 143
2018-01-24 09:48:30,582 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:stopContainerInternal(1365)) - Stopping container 
with container Id: container_e04_1516787230461_0001_01_000005
2018-01-24 09:48:31,093 INFO  impl.TimelineV2ClientImpl 
(TimelineV2ClientImpl.java:setTimelineCollectorInfo(172)) - Updated timeline 
service address to xxxxxx:40757
2018-01-24 09:48:32,675 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1516787230461_0001_01_000003 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL{code}

  was:
steps:
1) Run Dshell Application
{code}
yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
/usr/hdp/3.0.0.0-751/hadoop-yarn/hadoop-yarn-applications-distributedshell-*.jar
 -keep_containers_across_application_attempts -timeout 900000 -shell_command 
"sleep 110" -num_containers 4{code}
2) Find out host where AM is running. 
3) Find Containers launched by application
4) Restart NM where AM is running
5) Validate that new attempt is not started and containers launched before 
restart are in RUNNING state.

In this test, step#5 fails because containers failed to launch with error 143
{code}
2018-01-24 09:48:30,547 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1516787230461_0001_01_000003 transitioned from RUNNING to KILLING
2018-01-24 09:48:30,547 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container 
container_e04_1516787230461_0001_01_000003
2018-01-24 09:48:30,552 WARN  privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell 
execution returned exit code: 143. Privileged Execution Operation Stderr:

Stdout: main : command provided 1
main : run as user is hrt_qa
main : requested yarn user is hrt_qa
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/container_e04_1516787230461_0001_01_000003.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...

Full command array for failed execution:
[/usr/hdp/3.0.0.0-751/hadoop-yarn/bin/container-executor, hrt_qa, hrt_qa, 1, 
application_1516787230461_0001, container_e04_1516787230461_0001_01_000003, 
/grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003,
 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/launch_container.sh,
 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/container_e04_1516787230461_0001_01_000003.tokens,
 
/grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/container_e04_1516787230461_0001_01_000003.pid,
 /grid/0/hadoop/yarn/local, /grid/0/hadoop/yarn/log, cgroups=none]
2018-01-24 09:48:30,553 WARN  runtime.DefaultLinuxContainerRuntime 
(DefaultLinuxContainerRuntime.java:launchContainer(127)) - Launch container 
failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=143:
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:152)
        at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:549)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:465)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:285)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:95)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: ExitCodeException exitCode=143:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
        at org.apache.hadoop.util.Shell.run(Shell.java:902)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
        ... 10 more
2018-01-24 09:48:30,553 WARN  nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:launchContainer(557)) - Exit code from container 
container_e04_1516787230461_0001_01_000003 is : 143
2018-01-24 09:48:30,582 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:stopContainerInternal(1365)) - Stopping container 
with container Id: container_e04_1516787230461_0001_01_000005
2018-01-24 09:48:31,093 INFO  impl.TimelineV2ClientImpl 
(TimelineV2ClientImpl.java:setTimelineCollectorInfo(172)) - Updated timeline 
service address to ctr-e137-1514896590304-35201-01-000005.hwx.site:40757
2018-01-24 09:48:32,675 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e04_1516787230461_0001_01_000003 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL{code}


> Remove warning when a container is killed with an expected exit code
> --------------------------------------------------------------------
>
>                 Key: YARN-7818
>                 URL: https://issues.apache.org/jira/browse/YARN-7818
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Assignee: Shane Kumpf
>            Priority: Major
>
> steps:
>  1) Run Dshell Application
> {code:java}
> yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> /usr/hdp/3.0.0.0-751/hadoop-yarn/hadoop-yarn-applications-distributedshell-*.jar
>  -keep_containers_across_application_attempts -timeout 900000 -shell_command 
> "sleep 110" -num_containers 4{code}
> 2) Find out host where AM is running. 
>  3) Find Containers launched by application
>  4) Restart NM where AM is running
>  5) Validate that new attempt is not started and containers launched before 
> restart are in RUNNING state.
> In this test, step#5 fails because containers failed to launch with error 143
> {code:java}
> 2018-01-24 09:48:30,547 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2108)) - Container 
> container_e04_1516787230461_0001_01_000003 transitioned from RUNNING to 
> KILLING
> 2018-01-24 09:48:30,547 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(668)) - Cleaning up container 
> container_e04_1516787230461_0001_01_000003
> 2018-01-24 09:48:30,552 WARN  privileged.PrivilegedOperationExecutor 
> (PrivilegedOperationExecutor.java:executePrivilegedOperation(174)) - Shell 
> execution returned exit code: 143. Privileged Execution Operation Stderr:
> Stdout: main : command provided 1
> main : run as user is hrt_qa
> main : requested yarn user is hrt_qa
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/container_e04_1516787230461_0001_01_000003.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> Full command array for failed execution:
> [/usr/hdp/3.0.0.0-751/hadoop-yarn/bin/container-executor, hrt_qa, hrt_qa, 1, 
> application_1516787230461_0001, container_e04_1516787230461_0001_01_000003, 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003,
>  
> /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/launch_container.sh,
>  
> /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/container_e04_1516787230461_0001_01_000003.tokens,
>  
> /grid/0/hadoop/yarn/local/nmPrivate/application_1516787230461_0001/container_e04_1516787230461_0001_01_000003/container_e04_1516787230461_0001_01_000003.pid,
>  /grid/0/hadoop/yarn/local, /grid/0/hadoop/yarn/log, cgroups=none]
> 2018-01-24 09:48:30,553 WARN  runtime.DefaultLinuxContainerRuntime 
> (DefaultLinuxContainerRuntime.java:launchContainer(127)) - Launch container 
> failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=143:
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:124)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:152)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:549)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:465)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:285)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:95)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: ExitCodeException exitCode=143:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
>         at org.apache.hadoop.util.Shell.run(Shell.java:902)
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
>         ... 10 more
> 2018-01-24 09:48:30,553 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:launchContainer(557)) - Exit code from container 
> container_e04_1516787230461_0001_01_000003 is : 143
> 2018-01-24 09:48:30,582 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:stopContainerInternal(1365)) - Stopping container 
> with container Id: container_e04_1516787230461_0001_01_000005
> 2018-01-24 09:48:31,093 INFO  impl.TimelineV2ClientImpl 
> (TimelineV2ClientImpl.java:setTimelineCollectorInfo(172)) - Updated timeline 
> service address to xxxxxx:40757
> 2018-01-24 09:48:32,675 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2108)) - Container 
> container_e04_1516787230461_0001_01_000003 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to