[ https://issues.apache.org/jira/browse/YARN-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953872#comment-15953872 ]
Shane Kumpf edited comment on YARN-5366 at 4/3/17 5:25 PM: ----------------------------------------------------------- Thanks [~vinodkv]! Responses below. {quote} Signal.QUIT handling is very application specific. For e.g, nginx does graceful shutdown while JVMs do thead dump and don't shut-down at all. We shouldn't stop / rm container for QUIT at all? {quote} I addressed this in another design document, but here is the jist of it. While it is possible to do a {{docker kill --signal SIGQUIT}} this is limited in it usefulness and may result in unexpected behavior. The signal is always sent to PID 1 in the container. Depending on the image or app type, this may not be the process we want to catch that signal. Alternatively, users can specify the STOPSIGNAL in the Dockerfile and the user likely has a better understanding of the implications for that application/image type. Thoughts on how this should be handled? Should we just ignore the Signal.QUIT? {quote} I think the best we can do is to send the intent to container-executor binary and let it do stop and rm in one shot so as to save on multiple launches. {quote} IMO, moving more of this logic into c-e complicates matters and doesn't follow what we've done so far. Nearly all existing DockerCommands execute via c-e as a single Docker CLI command. If the concern is the performance hit, the Stop command here is a safeguard and should not get called as the container should be completed. However, you can't rm a container that isn't stopped, so ensuring it has been stopped is necessary. I've created and posted patches to YARN-6366 (Refactor the NodeManager DeletionService to support additional DeletionTask types) and YARN-6374 (Improve test coverage and add utility classes for common Docker operations). These are the prerequisites to have docker containers honor the debug delay. was (Author: shaneku...@gmail.com): Thanks [~vinodkv]! Responses below. {quote} Signal.QUIT handling is very application specific. For e.g, nginx does graceful shutdown while JVMs do thead dump and don't shut-down at all. We shouldn't stop / rm container for QUIT at all? {quote} I addressed this in another design document, but here is the jist of it. While it is possible to do a {{docker kill --signal SIGQUIT}} this is limited in it usefulness and may result in unexpected behavior. The signal is always sent to PID 1 in the container. Depending on the image or app type, this may not be the process we want to catch that signal. Alternatively, users can specify the STOPSIGNAL in the Dockerfile and the user likely has a better understanding of the implications for that application/image type. Thoughts on how this should be handled? {quote} I think the best we can do is to send the intent to container-executor binary and let it do stop and rm in one shot so as to save on multiple launches. {quote} IMO, moving more of this logic into c-e complicates matters and doesn't follow what we've done so far. Nearly all existing DockerCommands execute via c-e as a single Docker CLI command. If the concern is the performance hit, the Stop command here is a safeguard and should not get called as the container should be completed. However, you can't rm a container that isn't stopped, so ensuring it has been stopped is necessary. I've created and posted patches to YARN-6366 (Refactor the NodeManager DeletionService to support additional DeletionTask types) and YARN-6374 (Improve test coverage and add utility classes for common Docker operations). These are the prerequisites to have docker containers honor the debug delay. > Add support for toggling the removal of completed and failed docker containers > ------------------------------------------------------------------------------ > > Key: YARN-5366 > URL: https://issues.apache.org/jira/browse/YARN-5366 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn > Reporter: Shane Kumpf > Assignee: Shane Kumpf > Labels: oct16-medium > Attachments: YARN-5366.001.patch, YARN-5366.002.patch, > YARN-5366.003.patch, YARN-5366.004.patch, YARN-5366.005.patch, > YARN-5366.006.patch > > > Currently, completed and failed docker containers are removed by > container-executor. Add a job level environment variable to > DockerLinuxContainerRuntime to allow the user to toggle whether they want the > container deleted or not and remove the logic from container-executor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org