[
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106695#comment-14106695
]
Alexander Rukletsov edited comment on MESOS-1571 at 8/25/14 4:48 PM:
-
Currently there are two parameters that control graceful shutdown timeout:
EXECUTOR_SHUTDOWN_GRACE_PERIOD and EXECUTOR_SIGNAL_ESCALATION_TIMEOUT. The
simplified event chain looks like this:
1) Slave sends a ShutdownExecutorMessage to ExecutorProcess
2) ExecutorProcess sends a shutdown signal to the underlying executor (e.g.
CommandExecutor)
3) CommandExecutor tries to finish by sending SIGTERM to the process
4) If the process does not terminate after
EXECUTOR_SIGNAL_ESCALATION_TIMEOUT, executor sends SIGKILL to the process
5) If the executor (e.g. CommandExecutor) does not terminate after
EXECUTOR_SHUTDOWN_GRACE_PERIOD, ExecutorProcess kills the process group
starting with itself
6) If the ExecutorProcess does not terminate after
EXECUTOR_SHUTDOWN_GRACE_PERIOD, slave destroys the appropriate containerizer.
My thoughts are:
* The timeouts correlate significantly, that means setting them separately is
error-prone. Currently EXECUTOR_SHUTDOWN_GRACE_PERIOD may be configured. I
would propose setting one of them and calculate the other using some
[hard-coded?] delta.
* Since we would like to control the timeout not per slave, but per task or
framework, it looks like EXECUTOR_SIGNAL_ESCALATION_TIMEOUT should be
configurable.
* Do we want to tie the timeout per each task? Or passing it along with
ExecutorInfo or FrameworkInfo will suffice?
was (Author: alex-mesos):
Currently there are two parameters that control graceful shutdown timeout:
EXECUTOR_SHUTDOWN_GRACE_PERIOD and EXECUTOR_SIGNAL_ESCALATION_TIMEOUT. The
simplified event chain looks like this:
1) Slave sends a ShutdownExecutorMessage to executor
2) Executor tries to finish by sending SIGTERM to the process
3) If the process did not terminate after EXECUTOR_SIGNAL_ESCALATION_TIMEOUT,
executor sends SIGKILL to the process
4) If the executor did not terminate after EXECUTOR_SHUTDOWN_GRACE_PERIOD,
slave destroys the appropriate containerizer.
My thoughts are:
* The timeouts correlate significantly, that means setting them separately is
error-prone. Currently EXECUTOR_SHUTDOWN_GRACE_PERIOD may be configured. I
would propose setting one of them and calculate the other using some
[hard-coded?] delta.
* Since we would like to control the timeout not per slave, but per task or
framework, it looks like EXECUTOR_SIGNAL_ESCALATION_TIMEOUT should be
configurable.
* Do we want to tie the timeout per each task? Or passing it along with
ExecutorInfo or FrameworkInfo will suffice?
Signal escalation timeout is not configurable
-
Key: MESOS-1571
URL: https://issues.apache.org/jira/browse/MESOS-1571
Project: Mesos
Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov
Even though the executor shutdown grace period is set to a larger interval,
the signal escalation timeout will still be 3 seconds. It should either be
configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
Thoughts?
--
This message was sent by Atlassian JIRA
(v6.2#6252)