[
https://issues.apache.org/jira/browse/YARN-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297452#comment-16297452
]
Eric Yang commented on YARN-5366:
---------------------------------
The current implementation seems to work like this:
# Generate application data files in local directory.
# Write a launch_container.sh script in local directory.
# Launch_container.sh script contains instructions of how to mount all local
resources to docker container.
# Launch docker run with bootstrap script.
Container deletion service
# Remove local directory and docker container instance.
The current implementation is heavily depending on resource in local directory.
There is additional delay for generating per container resource, and container
will not be usable if launch_container.sh is removed. If container debug is
enabled, and container is in stop state. There is no guarantee that we can
restart the container using docker start command to look inside the container.
It would be better to pass environment variables to docker run command than
running the bash script post docker instance construction. This will ensure
that changes to the launch_container.sh does not have influence to restart
docker instance. This can strengthen ability to debug without worry about
possible loopholes to prevent debug.
> Improve handling of the Docker container life cycle
> ---------------------------------------------------
>
> Key: YARN-5366
> URL: https://issues.apache.org/jira/browse/YARN-5366
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Reporter: Shane Kumpf
> Assignee: Shane Kumpf
> Labels: oct16-medium
> Attachments: YARN-5366.001.patch, YARN-5366.002.patch,
> YARN-5366.003.patch, YARN-5366.004.patch, YARN-5366.005.patch,
> YARN-5366.006.patch, YARN-5366.007.patch, YARN-5366.008.patch
>
>
> There are several paths that need to be improved with regard to the Docker
> container lifecycle when running Docker containers on YARN.
> 1) Provide the ability to keep a container on the NodeManager for a set
> period of time for debugging purposes.
> 2) Support sending signals to the process in the container to allow for
> triggering stack traces, heap dumps, etc.
> 3) Support for Docker's live restore, which means moving away from the use of
> {{docker wait}}. (YARN-5818)
> 4) Improve the resiliency of liveliness checks (kill -0) by adding retries.
> 5) Improve the resiliency of container removal by adding retries.
> 6) Only attempt to stop, kill, and remove containers if the current container
> state allows for it.
> 7) Better handling of short lived containers when the container is stopped
> before the PID can be retrieved. (YARN-6305)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]