[jira] [Commented] (YARN-5258) Document Use of Docker with LinuxContainerExecutor

Sidharta Seethana (JIRA) Thu, 03 Nov 2016 16:09:21 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634608#comment-15634608
 ]


Sidharta Seethana commented on YARN-5258:
-----------------------------------------

Feedback : 

{code}
Docker combines an easy-to-use interface to Linux containers with 
easy-to-construct image files for those containers. In short, Docker launches 
very light weight virtual machines.
{code}

IMO, this is not an accurate characterization and we should drop it. This blog 
post explains this is more detail : 
https://blog.docker.com/2016/03/containers-are-not-vms/ . It might also be a 
good idea to link to http://docs.docker.com/ . 

{code}
The Linux Container Executor (LCE) allows the YARN NodeManager to launch YARN 
containers into Docker containers. 
{code}

I believe a brief description of ‘container runtimes’ would be warranted in 
this section - LCE currently supports two - the default/‘process’ based runtime 
and the docker runtime. It is possible to choose between these on a per 
container basis. Alternatively, additional information could be added in a 
follow up patch(es).

{code}
The Docker suuport in the LCE is still evolving.
{code}

minor typo. suuport -> support

{code}
To track progress, follow JIRA-3611, 
{code}

It might be better to say YARN-3611 - with a link. 

{code}
sudo docker pull images/hadoop-docker:latest
{code}

IMO, this should be a working example. That said, I am not aware of any popular 
vendor-neutral images that would be a good candidate. This has been one of the 
barriers to creating good documentation for this functionality. Should we 
consider hosting ‘official’ apache hadoop images on docker hub ? Thoughts ? 

{code}
The following properties should be set in yarn-site.xml:
{code}

Some of the properties described here don’t have values that are inline with 
yarn-default.xml . Specifically, 
yarn.nodemanager.runtime.linux.docker.allowed-container-networks and 
yarn.nodemanager.runtime.linux.docker.privileged-containers.acl . There is also 
a setting that isn’t mentioned here : 
yarn.nodemanager.runtime.linux.docker.default-container-network . I think a 
separate section on networking is warranted - I’ll submit a follow up patch 
with additional documentation. 

{code}
feature.docker.enabled
{code}

In the context of this functionality, this is not ‘optional’ - this must be set 
of 1. 


{code}
In order to work with YARN, there are two requirements for Docker images.
{code}

There are additional limitations - again, these could be added in subsequent 
updates to the documentation. An important limitation that comes to mind is 
that because YARN always overrides the command the container is launched with, 
images with an {{ENTRYPOINT}} directive will not work. Application frameworks 
may impose their additional requirements. For example, using slider with Docker 
and YARN (currently) requires that all images have python installed in them (in 
order to run the slider agent). 

{code}
First, the Docker container will be explicitly launched with the application 
owner as the container user. If the application owner is not a valid user (by 
UID) in the Docker image, the application will fail.
{code}

By UID? This is not clear - it might be useful to provide an example here. One 
example I can think of here - the UID of ‘nobody’ is different in CentOS vs 
Ubuntu - so running an Ubuntu container on CentOS as user ‘nobody’ is likely to 
cause failures. 


{code}
In order to run an application in a Docker container, set the following 
environment variables in the application's environment:
{code}

 It might be worth pointing out that while this is not ideal, it does allow for 
some existing applications that can inject environment variables to run in 
docker containers without modifications e.g spark and map reduce. 

{code}
Example: Spark
{code}

As mentioned earlier : I think this should be an actual working example and we 
should consider exploring what it would take to make that possible. 

I think this is a great start, thanks again [~templedf] for taking this on. 



> Document Use of Docker with LinuxContainerExecutor
> --------------------------------------------------
>
>                 Key: YARN-5258
>                 URL: https://issues.apache.org/jira/browse/YARN-5258
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: documentation
>    Affects Versions: 2.8.0
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>              Labels: oct16-easy
>         Attachments: YARN-5258.001.patch, YARN-5258.002.patch
>
>
> There aren't currently any docs that explain how to configure Docker and all 
> of its various options aside from reading all of the JIRAs.  We need to 
> document the configuration, use, and troubleshooting, along with helpful 
> examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-5258) Document Use of Docker with LinuxContainerExecutor

Reply via email to