[
https://issues.apache.org/jira/browse/YARN-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15689892#comment-15689892
]
Shane Kumpf commented on YARN-5818:
-----------------------------------
{{docker wait}} will have to be removed to support Docker live restore.
Retrying the {{docker wait}} is brittle, as it requires parsing stderr and
looking for a specific string which could change without notice.
I propose we replace the {{docker wait}} approach with the following to support
live restore:
# {{docker run}} to start the container.
# {{docker inspect}} to get the pid.
# Null signal ({{kill -0 pid}}) liveliness loop waiting for the container to
complete.
# {{docker inspect}} the finished container for the exit code.
# Write the exitcode file to be picked up by the NM
The null signal loop has pitfalls, but this is the pattern we rely upon else
where where wait/waitpid aren't possible (container re-acquisition on NM
restart for example).
I'll put up a patch that does the above as a starting point. Please provide
your thoughts on the approach.
> Support the Docker Live Restore feature
> ---------------------------------------
>
> Key: YARN-5818
> URL: https://issues.apache.org/jira/browse/YARN-5818
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Reporter: Shane Kumpf
>
> Docker 1.12.x introduced the docker [Live
> Restore|https://docs.docker.com/engine/admin/live-restore/] feature which
> allows docker containers to survive docker daemon restarts/upgrades. Support
> for this feature should be added to YARN to allow docker changes and upgrades
> to be less impactful to existing containers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]