[
https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595612#comment-16595612
]
Eric Yang commented on YARN-8706:
---------------------------------
[~csingh] We can arrange it as NM_SLEEP_DELAY_BEFORE_SIGKILL_MS to be greater
value than NM_DOCKER_STOP_GRACE_PERIOD. Docker stop -t flag can honor
NM_DOCKER_STOP_GRACE_PERIOD, and NM_SLEEP_DELAY_BEFORE_SIGKILL_MS will be
enforced after NM_DOCKER_STOP_GRACE_PERIOD expires for catch all lingering
processes?
If this is setup properly, code only needs to ensure
NM_SLEEP_DELAY_BEFORE_SIGKILL_MS is greater than NM_DOCKER_STOP_GRACE_PERIOD to
prevent the double killing. Thoughts?
> DelayedProcessKiller is executed for Docker containers even though docker
> stop sends a KILL signal after the specified grace period
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-8706
> URL: https://issues.apache.org/jira/browse/YARN-8706
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Chandni Singh
> Assignee: Chandni Singh
> Priority: Major
> Labels: docker
>
> {{DockerStopCommand}} adds a grace period of 10 seconds.
> 10 seconds is also the default grace time use by docker stop
> [https://docs.docker.com/engine/reference/commandline/stop/]
> Documentation of the docker stop:
> {quote}the main process inside the container will receive {{SIGTERM}}, and
> after a grace period, {{SIGKILL}}.
> {quote}
> There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes
> for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By
> default this is set to {{250 milliseconds}} and so irrespective of the
> container type, it will always get executed.
>
> For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}}
> after the grace period
> - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of
> executing DelayedProcessKiller
> - when sleepDelayBeforeSigKill < 1 second, then the grace period should be
> the smallest value, which is 1 second, because anyways we are forcing kill
> after 250 ms
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]