[
https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634608#comment-15634608
]
Sidharta Seethana commented on YARN-5258:
-----------------------------------------
Feedback :
{code}
Docker combines an easy-to-use interface to Linux containers with
easy-to-construct image files for those containers. In short, Docker launches
very light weight virtual machines.
{code}
IMO, this is not an accurate characterization and we should drop it. This blog
post explains this is more detail :
https://blog.docker.com/2016/03/containers-are-not-vms/ . It might also be a
good idea to link to http://docs.docker.com/ .
{code}
The Linux Container Executor (LCE) allows the YARN NodeManager to launch YARN
containers into Docker containers.
{code}
I believe a brief description of ‘container runtimes’ would be warranted in
this section - LCE currently supports two - the default/‘process’ based runtime
and the docker runtime. It is possible to choose between these on a per
container basis. Alternatively, additional information could be added in a
follow up patch(es).
{code}
The Docker suuport in the LCE is still evolving.
{code}
minor typo. suuport -> support
{code}
To track progress, follow JIRA-3611,
{code}
It might be better to say YARN-3611 - with a link.
{code}
sudo docker pull images/hadoop-docker:latest
{code}
IMO, this should be a working example. That said, I am not aware of any popular
vendor-neutral images that would be a good candidate. This has been one of the
barriers to creating good documentation for this functionality. Should we
consider hosting ‘official’ apache hadoop images on docker hub ? Thoughts ?
{code}
The following properties should be set in yarn-site.xml:
{code}
Some of the properties described here don’t have values that are inline with
yarn-default.xml . Specifically,
yarn.nodemanager.runtime.linux.docker.allowed-container-networks and
yarn.nodemanager.runtime.linux.docker.privileged-containers.acl . There is also
a setting that isn’t mentioned here :
yarn.nodemanager.runtime.linux.docker.default-container-network . I think a
separate section on networking is warranted - I’ll submit a follow up patch
with additional documentation.
{code}
feature.docker.enabled
{code}
In the context of this functionality, this is not ‘optional’ - this must be set
of 1.
{code}
In order to work with YARN, there are two requirements for Docker images.
{code}
There are additional limitations - again, these could be added in subsequent
updates to the documentation. An important limitation that comes to mind is
that because YARN always overrides the command the container is launched with,
images with an {{ENTRYPOINT}} directive will not work. Application frameworks
may impose their additional requirements. For example, using slider with Docker
and YARN (currently) requires that all images have python installed in them (in
order to run the slider agent).
{code}
First, the Docker container will be explicitly launched with the application
owner as the container user. If the application owner is not a valid user (by
UID) in the Docker image, the application will fail.
{code}
By UID? This is not clear - it might be useful to provide an example here. One
example I can think of here - the UID of ‘nobody’ is different in CentOS vs
Ubuntu - so running an Ubuntu container on CentOS as user ‘nobody’ is likely to
cause failures.
{code}
In order to run an application in a Docker container, set the following
environment variables in the application's environment:
{code}
It might be worth pointing out that while this is not ideal, it does allow for
some existing applications that can inject environment variables to run in
docker containers without modifications e.g spark and map reduce.
{code}
Example: Spark
{code}
As mentioned earlier : I think this should be an actual working example and we
should consider exploring what it would take to make that possible.
I think this is a great start, thanks again [~templedf] for taking this on.
> Document Use of Docker with LinuxContainerExecutor
> --------------------------------------------------
>
> Key: YARN-5258
> URL: https://issues.apache.org/jira/browse/YARN-5258
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: documentation
> Affects Versions: 2.8.0
> Reporter: Daniel Templeton
> Assignee: Daniel Templeton
> Priority: Critical
> Labels: oct16-easy
> Attachments: YARN-5258.001.patch, YARN-5258.002.patch
>
>
> There aren't currently any docs that explain how to configure Docker and all
> of its various options aside from reading all of the JIRAs. We need to
> document the configuration, use, and troubleshooting, along with helpful
> examples.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]