[jira] [Commented] (YARN-3854) Add localization support for docker images

Shane Kumpf (JIRA) Wed, 20 Jul 2016 12:46:37 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386501#comment-15386501
 ]


Shane Kumpf commented on YARN-3854:
-----------------------------------

[~tangzhankun] thanks for the patch and doc. It echos many of my concerns.

I've given image localization and management quite a bit of thought, and so far 
I haven't come up with a great solution here. Some of the goals I had in mind 
that, IMO, should be carried forward are to minimize dependence on the 
internet, get the container started as fast as possible, ensure the same image 
is used for the duration of an application, and maintain the image's metadata.

[~templedf] Today, when the docker run in container executor is issued, a 
docker pull is run behind the scenes, similar to what you are suggesting. The 
potential for timeouts is high in unstable networks. This also doesn't work for 
docker hub private repositories, but that is a separate issue that needs to be 
filed.

One comment on the approach here, IIRC, docker export/import also retains the 
history and layers, whereas save/load flattens, so we should likely use 
export/import vs save/load.

The approach outlined in the patch does have its merits. You are not dependent 
on being able to pull from docker hub or a private registry and could ensure 
that the same image is run by all of the tasks in the job. I believe it would 
be possible to keep the image metadata intact as well.

My concerns with using Dockerhub/a private registry is what happens during a 
long running job if someone pushes a new "latest" to the registry? Would the 
docker pull result in the last part of my application running a different image 
(perhaps that doesn't apply to what you have in mind)? However, I completely 
agree with Daniel's concerns on the current approach, plus it's extra work 
administrators now have to do to get the images packaged and into HDFS.

Somewhat OT, but I started on a HDFS storage plugin for the docker registry 
storage driver API, but the API was changing daily, so I put this on the back 
burner waiting for a bit more stabilization - 
[docker-registry-hdfs|https://hub.docker.com/r/sakserv/docker-registry-hdfs/] 
if you want to play with it. It allows for doing a docker pull from a private 
registry backed by HDFS. This would help satisfy the goal of not depending on 
the internet/docker hub and maintaining the image's metadata, but beyond that 
it doesn't buy us much. I'm hopeful there are interesting "hacks" where this 
might provide benefits to YARN in the future. 

Perhaps image management and localization should be handled outside of the 
application lifecycle? Otherwise, image localization could introduce 
significant lag for starting containers (which may be OK?).

Interested in other's thoughts here.

> Add localization support for docker images
> ------------------------------------------
>
>                 Key: YARN-3854
>                 URL: https://issues.apache.org/jira/browse/YARN-3854
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Sidharta Seethana
>            Assignee: Zhankun Tang
>         Attachments: YARN-3854-branch-2.8.001.patch, 
> YARN-3854_Localization_support_for_Docker_image_v1.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v2.pdf
>
>
> We need the ability to localize images from HDFS and load them for use when 
> launching docker containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-3854) Add localization support for docker images

Reply via email to