[
https://issues.apache.org/jira/browse/YARN-8181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444491#comment-16444491
]
Shane Kumpf commented on YARN-8181:
-----------------------------------
[~sajavadi] - Thanks for the report and your interest in this feature! The
documentation is available here:
[http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/DockerContainers.html]
Regarding the behavior above, the container completed successfully and very
quickly. I expect the image isn't privileged/trusted (and the ENTRYPOINT/CMD in
your Dockerfile is something like {{bash}}).
As a result of being a non-privileged/untrusted image, the MR launcher script
is not executed in the container so the PI mapper/reducers never actually run
here. Instead, whatever is set in the Dockerfile will be executed in the
container. If the Dockerfile is setup to use a command that will not keep the
container alive, the container completes very quickly, as you saw.
Can you try the following to add this image as privileged/trusted and rerun the
pi job?
# Add {{docker.privileged-containers.registries}} to
{{container-executor.cfg}} under the {{[docker]}} section with the value
{{local}} (if the configuration already exists, append {{local}} to the list).
# Tag the {{hadoop-ubuntu}} image with so that it is in the {{local}}
namespace with {{docker tag hadoop-ubuntu:latest local/hadoop-ubuntu:latest}}.
# Change {{YARN_CONTAINER_RUNTIME_DOCKER_IMAGE}}'s value to
{{local/hadoop-ubuntu:latest}}.
Let me know if that works and I'll open an issue to update the documentation
with similar pointers.
> Docker container run_time
> -------------------------
>
> Key: YARN-8181
> URL: https://issues.apache.org/jira/browse/YARN-8181
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Seyyed Ahmad Javadi
> Priority: Major
>
> Hi All,
> I want to use docker container run time but could not solve the facing
> problem. I am following the guide below and the NM log is as follows. I can
> not see any docker containers to be created. It works when I use default LCE.
> Please also find how I submit a job at the end as well.
> Do you have any guide on how can I make Docker rum_time works?
> May you please let me know how can use LCE binary to make sure my docker
> setup is correct?
> I confirmed that "docker run" works fine. I really like this developing
> feature and would like to contribute to it. Many thanks in advance.
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/DockerContainers.html]
> {code:java}
> NM LOG:
> ...
> 2018-04-19 11:49:24,568 INFO SecurityLogger.org.apache.hadoop.ipc.Server:
> Auth successful for appattempt_1524151293356_0005_000001 (auth:SIMPLE)
> 2018-04-19 11:49:24,580 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Start request for container_1524151293356_0005_01_000001 by user ubuntu
> 2018-04-19 11:49:24,584 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Creating a new application reference for app application_1524151293356_0005
> 2018-04-19 11:49:24,584 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=ubuntu
> IP=130.245.127.176 OPERATION=Start Container Request
> TARGET=ContainerManageImpl RESULT=SUCCESS
> APPID=application_1524151293356_0005
> CONTAINERID=container_1524151293356_0005_01_000001
> 2018-04-19 11:49:24,585 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Application application_1524151293356_0005 transitioned from NEW to INITING
> 2018-04-19 11:49:24,585 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Adding container_1524151293356_0005_01_000001 to application
> application_1524151293356_0005
> 2018-04-19 11:49:24,585 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Application application_1524151293356_0005 transitioned from INITING to
> RUNNING
> 2018-04-19 11:49:24,588 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
> Container container_1524151293356_0005_01_000001 transitioned from NEW to
> LOCALIZING
> 2018-04-19 11:49:24,588 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event CONTAINER_INIT for appId application_1524151293356_0005
> 2018-04-19 11:49:24,589 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Created localizer for container_1524151293356_0005_01_000001
> 2018-04-19 11:49:24,616 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Writing credentials to the nmPrivate file
> /tmp/hadoop-ubuntu/nm-local-dir/nmPrivate/container_1524151293356_0005_01_000001.tokens
> 2018-04-19 11:49:28,090 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
> Container container_1524151293356_0005_01_000001 transitioned from
> LOCALIZING to SCHEDULED
> 2018-04-19 11:49:28,090 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
> Starting container [container_1524151293356_0005_01_000001]
> 2018-04-19 11:49:28,212 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
> Container container_1524151293356_0005_01_000001 transitioned from SCHEDULED
> to RUNNING
> 2018-04-19 11:49:28,212 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Starting resource-monitoring for container_1524151293356_0005_01_000001
> 2018-04-19 11:49:29,401 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Container container_1524151293356_0005_01_000001 succeeded
> 2018-04-19 11:49:29,401 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
> Container container_1524151293356_0005_01_000001 transitioned from RUNNING
> to EXITED_WITH_SUCCESS
> 2018-04-19 11:49:29,401 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Cleaning up container container_1524151293356_0005_01_000001
> 2018-04-19 11:49:29,520 INFO
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Removing
> Docker container : container_1524151293356_0005_01_000001
> 2018-04-19 11:49:34,517 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Could not get pid for container_1524151293356_0005_01_000001. Waited for
> 5000 ms.
> 2018-04-19 11:49:34,517 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Unable to obtain pid, but docker container request detected. Attempting to
> reap container container_1524151293356_0005_01_000001
> 2018-04-19 11:49:36,927 INFO
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting
> absolute path :
> /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1524151293356_0005/container_1524151293356_0005_01_000001
> 2018-04-19 11:49:36,928 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=ubuntu
> OPERATION=Container Finished - Succeeded TARGET=ContainerImpl
> RESULT=SUCCESS APPID=application_1524151293356_0005
> CONTAINERID=container_1524151293356_0005_01_000001
> 2018-04-19 11:49:36,929 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
> Container container_1524151293356_0005_01_000001 transitioned from
> EXITED_WITH_SUCCESS to DONE
> 2018-04-19 11:49:36,938 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Removing container_1524151293356_0005_01_000001 from application
> application_1524151293356_0005
> 2018-04-19 11:49:36,938 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Stopping resource-monitoring for container_1524151293356_0005_01_000001
> 2018-04-19 11:49:36,938 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event CONTAINER_STOP for appId application_1524151293356_0005
> 2018-04-19 11:49:37,941 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed
> completed containers from NM context: [container_1524151293356_0005_01_000001]
> 2018-04-19 11:49:50,966 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Application application_1524151293356_0005 transitioned from RUNNING to
> APPLICATION_RESOURCES_CLEANINGUP
> 2018-04-19 11:49:50,967 INFO
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting
> absolute path :
> /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1524151293356_0005
> 2018-04-19 11:49:50,967 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event APPLICATION_STOP for appId application_1524151293356_0005
> 2018-04-19 11:49:50,967 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
> Application application_1524151293356_0005 transitioned from
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-19 11:49:50,967 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
> Scheduling Log Deletion for application: application_1524151293356_0005,
> with delay of 10800 seconds
> {code}
> {code:java}
> vars="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-ubuntu:latest"
> #vars="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-ubuntu:latest,YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=false,YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=host"
> #vars="YARN_CONTAINER_RUNTIME_TYPE=default"
> hadoop jar
> /home/ubuntu/hadoop-3.1.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar
> pi -Dyarn.app.mapreduce.am.env=$vars -Dmapreduce.map.env=$vars
> -Dmapreduce.reduce.env=$vars 2 10
> {code}
> {code:java}
> Number of Maps = 2
> Samples per Map = 10
> Wrote input for Map #0
> Wrote input for Map #1
> Starting Job
> 2018-04-19 11:49:22,786 INFO client.RMProxy: Connecting to ResourceManager at
> bay1-vm1/130.245.127.176:8032
> 2018-04-19 11:49:23,435 INFO mapreduce.JobResourceUploader: Disabling Erasure
> Coding for path:
> /tmp/hadoop-yarn/staging/ubuntu/.staging/job_1524151293356_0005
> 2018-04-19 11:49:23,601 INFO input.FileInputFormat: Total input files to
> process : 2
> 2018-04-19 11:49:23,756 INFO mapreduce.JobSubmitter: number of splits:2
> 2018-04-19 11:49:23,824 INFO Configuration.deprecation:
> yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead,
> use yarn.system-metrics-publisher.enabled
> 2018-04-19 11:49:24,015 INFO mapreduce.JobSubmitter: Submitting tokens for
> job: job_1524151293356_0005
> 2018-04-19 11:49:24,017 INFO mapreduce.JobSubmitter: Executing with tokens: []
> 2018-04-19 11:49:24,262 INFO conf.Configuration: resource-types.xml not found
> 2018-04-19 11:49:24,262 INFO resource.ResourceUtils: Unable to find
> 'resource-types.xml'.
> 2018-04-19 11:49:24,350 INFO impl.YarnClientImpl: Submitted application
> application_1524151293356_0005
> 2018-04-19 11:49:24,398 INFO mapreduce.Job: The url to track the job:
> http://bay1-vm1:8088/proxy/application_1524151293356_0005/
> 2018-04-19 11:49:24,399 INFO mapreduce.Job: Running job:
> job_1524151293356_0005
> 2018-04-19 11:49:50,658 INFO mapreduce.Job: Job job_1524151293356_0005
> running in uber mode : false
> 2018-04-19 11:49:50,660 INFO mapreduce.Job: map 0% reduce 0%
> 2018-04-19 11:49:50,676 INFO mapreduce.Job: Job job_1524151293356_0005 failed
> with state FAILED due to: Application application_1524151293356_0005 failed 2
> times due to AM Container for appattempt_1524151293356_0005_000002 exited
> with exitCode: 0
> Failing this attempt.Diagnostics: For more detailed output, check the
> application tracking page:
> http://bay1-vm1:8088/cluster/app/application_1524151293356_0005 Then click on
> links to logs of each attempt.
> . Failing the application.
> 2018-04-19 11:49:50,702 INFO mapreduce.Job: Counters: 0
> Job job_1524151293356_0005 failed!
> runtime in seconds: 34
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]