[
https://issues.apache.org/jira/browse/YARN-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320775#comment-16320775
]
Eric Yang commented on YARN-5366:
---------------------------------
[[email protected]] The way docker commands are handled and environment
variable setup play a critical role to ensure smooth integration between YARN
and docker. If I am reading this correctly, the current launcher operation
performs as:
{code}
NodeManager
container-executor
docker run ... launch_container.sh
user application unix command
{code}
User defined environment variables and a lot of internal wiring are done in
{{launcher_container.sh}}. Would it be possible to change the environment
variable construction for docker run command to use -e k=v instructions? This
would reduce the effort to rewrite code to support ENTRY_POINT for docker. In
the ideal case, the pipeline of the execution supposed to be:
{code}
NodeManager
container-executor
docker run -e k=v [launcher_command]
{code}
This reduce the reliance of mounting launch_container.sh, and run it inside the
container. This would honor docker container to be a standalone unit without
reliance on Yarn generated script to run, and support docker ENTRY_POINT. Is
it possible to improve the launcher bootstrap this way?
> Improve handling of the Docker container life cycle
> ---------------------------------------------------
>
> Key: YARN-5366
> URL: https://issues.apache.org/jira/browse/YARN-5366
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Reporter: Shane Kumpf
> Assignee: Shane Kumpf
> Labels: oct16-medium
> Attachments: YARN-5366.001.patch, YARN-5366.002.patch,
> YARN-5366.003.patch, YARN-5366.004.patch, YARN-5366.005.patch,
> YARN-5366.006.patch, YARN-5366.007.patch, YARN-5366.008.patch,
> YARN-5366.009.patch, YARN-5366.010.patch
>
>
> There are several paths that need to be improved with regard to the Docker
> container lifecycle when running Docker containers on YARN.
> 1) Provide the ability to keep a container on the NodeManager for a set
> period of time for debugging purposes.
> 2) Support sending signals to the process in the container to allow for
> triggering stack traces, heap dumps, etc.
> 3) Support for Docker's live restore, which means moving away from the use of
> {{docker wait}}. (YARN-5818)
> 4) Improve the resiliency of liveliness checks (kill -0) by adding retries.
> 5) Improve the resiliency of container removal by adding retries.
> 6) Only attempt to stop, kill, and remove containers if the current container
> state allows for it.
> 7) Better handling of short lived containers when the container is stopped
> before the PID can be retrieved. (YARN-6305)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]