[
https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876144#comment-16876144
]
Peter Bacsko commented on YARN-9660:
------------------------------------
cc [[email protected]] [~eyang] [~snemeth] - what do you guys think?
I believe some of these could be detected and even printed to the user. The
hard-coded {{/bin/bash}} could be overridable in {{UnixShellScriptBuilder}}. We
have options here.
> Enhance documentation of Docker on YARN support
> -----------------------------------------------
>
> Key: YARN-9660
> URL: https://issues.apache.org/jira/browse/YARN-9660
> Project: Hadoop YARN
> Issue Type: Bug
> Components: documentation, nodemanager
> Reporter: Peter Bacsko
> Priority: Major
>
> Right now, using Docker on YARN has some hard requirements. If these
> requirements are not met, then launching the containers will fail and and
> error message will be printed. Depending on how familiar the user is with
> Docker, it might or might not be easy for them to understand what went wrong
> and how to fix the underlying problem.
> It would be important to explicitly document these requirements along with
> the error messages.
> *#1: CGroups handler cannot be systemd*
> If docker deamon runs with systemd cgroups handler, we receive the following
> error upon launching a container:
> {noformat}
> Container id: container_1561638268473_0006_01_000002
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon:
> cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
> Solution: switch to cgroupfs. Doing so can be OS-specific, but we can
> document a {{systemcl}} example.
>
> *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container*
> Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}.
> It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and
> there's only {{/bin/sh}}.
> If we try to use these kind of images, we'll see the following error message:
> {noformat}
> Container id: container_1561638268473_0015_01_000002
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: /usr/bin/docker-current: Error response from daemon: oci
> runtime error: container_linux.go:235: starting container process caused
> "exec: \"bash\": executable file not found in $PATH".
> Shell output: main : command provided 4
> main : run as user is johndoe
> main : requested yarn user is johndoe
> {noformat}
>
> *#3: {{find}} command must be available on the {{$PATH}}*
> It seems obvious that we have the {{find}} command, but even very popular
> images like {{fedora}} requires that we install it separately.
> If we don't have {{find}} available, then {{launcher_container.sh}} fails
> with:
> {noformat}
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127.
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_000002/launch_container.sh:
> line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127.
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_000002/launch_container.sh:
> line 44: find: command not found
> Last 4096 bytes of stderr.txt :
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]