Yesha Vora created YARN-8231:
--------------------------------
Summary: Dshell application fails when one of the docker container
gets killed
Key: YARN-8231
URL: https://issues.apache.org/jira/browse/YARN-8231
Project: Hadoop YARN
Issue Type: Bug
Components: yarn-native-services
Reporter: Yesha Vora
1) Launch dshell application
{code}
yarn jar hadoop-yarn-applications-distributedshell-*.jar -shell_command
"sleep 300" -num_containers 2 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker
-shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest
-keep_containers_across_application_attempts -jar
/usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
2) Kill container_1524681858728_0012_01_000002
Expected behavior:
Application should start new instance and finish successfully
Actual behavior:
Application Failed as soon as container was killed
{code:title=AM log}
18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: Got response from RM
for container ask, completedCnt=1
18/04/27 23:05:12 INFO distributedshell.ApplicationMaster:
appattempt_1524681858728_0012_000001 got container status for
containerID=container_1524681858728_0012_01_000002, state=COMPLETE,
exitStatus=137, diagnostics=[2018-04-27 23:05:09.310]Container killed on
request. Exit code is 137
[2018-04-27 23:05:09.331]Container exited with a non-zero exit code 137.
[2018-04-27 23:05:09.332]Killed by external signal
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Got response from RM
for container ask, completedCnt=1
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster:
appattempt_1524681858728_0012_000001 got container status for
containerID=container_1524681858728_0012_01_000003, state=COMPLETE,
exitStatus=0, diagnostics=
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Container completed
successfully., containerId=container_1524681858728_0012_01_000003
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application
completed. Stopping running containers
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application
completed. Signalling finish to RM
18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Diagnostics.,
total=2, completed=2, allocated=2, failed=1
18/04/27 23:08:46 INFO impl.AMRMClientImpl: Waiting for application to be
successfully unregistered.{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]