[
https://issues.apache.org/jira/browse/YARN-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323002#comment-16323002
]
Shane Kumpf commented on YARN-5366:
-----------------------------------
Thanks for the review [~eyang]!
{quote}
Would it be possible to change the environment variable construction for docker
run command to use -e k=v instructions?
{quote}
I do see the benefit in reducing or eliminating the need for the launch
container script, but I'd like to address that as a follow on, perhaps as part
of YARN-7654 if that would be ok. I'm hesitant to make this patch even larger
since it doesn't make changes to launching, only recovery and clean up.
{quote}
I did not find retry/sleep mechanism to repeat the signal as suggested in 5)
{quote}
That is correct. I briefly mentioned it above. The patch is a bit large as is,
so I'd like to address those in a follow up. I'll clean up the description,
title, and open follow up tasks now if we are good with the current scope.
> Improve handling of the Docker container life cycle
> ---------------------------------------------------
>
> Key: YARN-5366
> URL: https://issues.apache.org/jira/browse/YARN-5366
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Reporter: Shane Kumpf
> Assignee: Shane Kumpf
> Labels: oct16-medium
> Attachments: YARN-5366.001.patch, YARN-5366.002.patch,
> YARN-5366.003.patch, YARN-5366.004.patch, YARN-5366.005.patch,
> YARN-5366.006.patch, YARN-5366.007.patch, YARN-5366.008.patch,
> YARN-5366.009.patch, YARN-5366.010.patch
>
>
> There are several paths that need to be improved with regard to the Docker
> container lifecycle when running Docker containers on YARN.
> 1) Provide the ability to keep a container on the NodeManager for a set
> period of time for debugging purposes.
> 2) Support sending signals to the process in the container to allow for
> triggering stack traces, heap dumps, etc.
> 3) Support for Docker's live restore, which means moving away from the use of
> {{docker wait}}. (YARN-5818)
> 4) Improve the resiliency of liveliness checks (kill -0) by adding retries.
> 5) Improve the resiliency of container removal by adding retries.
> 6) Only attempt to stop, kill, and remove containers if the current container
> state allows for it.
> 7) Better handling of short lived containers when the container is stopped
> before the PID can be retrieved. (YARN-6305)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]