[ https://issues.apache.org/jira/browse/YARN-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347524#comment-14347524 ]
Jason Lowe commented on YARN-3289: ---------------------------------- Regarding a separate prepping task, localization already is a separate preparation task for non-public resources. See ContainerLocalizer. I don't think docker image download and localization as is done today is fundamentally different at a high level -- in both cases we are prepping the node to be able to run the container. No need to complicate the process with a specialized extra step just for docker. What we're missing here is progress reporting during localization so AMs can properly monitor progress of container launch requests before their code starts running, and that's useful for non-docker localization scenarios as well. Adjusting locality based on the cost of localization is an interesting idea, and applies to the non-docker case as well. However the docker case can be a bit tricky. One node may take tens of minutes to localize a docker image, but another node might only take a few seconds. Docker images are often derived from other images, and docker only downloads the deltas. So it will be difficult for YARN that is not aware of the docker contents of a node or image deltas to predict how long any node will take to localize a given docker image. > Docker images should be downloaded during localization > ------------------------------------------------------ > > Key: YARN-3289 > URL: https://issues.apache.org/jira/browse/YARN-3289 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Ravi Prakash > > We currently call docker run on images while launching containers. If the > image size if sufficiently big, the task will timeout. We should download the > image we want to run during localization (if possible) to prevent this -- This message was sent by Atlassian JIRA (v6.3.4#6332)