[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2015-02-18 Thread James DeFelice (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325794#comment-14325794
 ] 

James DeFelice commented on MESOS-1571:
---

In the kubernetes-mesos framework, the executor Shutdown() implementation 
currently force-stop's the containers it's managing (which, to my 
understanding, sends SIGKILL). It manages Docker containers, which are normally 
given 10s to shut down gracefully before Docker sends a SIGKILL. That 10s 
timeout is not compatible with the default slave flag 
`executor_shudown_grace_timeout` value of mesos (3s). However if I change the 
value of that timeout to 20s to give the executor more time to gracefully kill 
things there's no way for the executor to reason about that because it has no 
idea of how much time it actually has.

As a workaround I've considered looking up the slave PID from the environment 
and querying its state.json for the startup flags, and trying to make a 
decision based on that. That approach seems somewhat hackish and I'd much 
rather do something nicer.

It would be great to have an environment var 
`MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD` or something, provided by the slave 
containerizer, so that the executor can make a decision about whether to send 
(via Docker) a TERM (and wait 10s) or KILL signal.

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2015-01-08 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269345#comment-14269345
 ] 

Alexander Rukletsov commented on MESOS-1571:


Commit: c8a7aff24fbd2c6ee2e6daadf4ad78f79a5e9cf6 [c8a7aff]
Author: Alexander Rukletsov a...@mesosphere.io
Committer: Niklas Q. Nielsen nik...@mesosphere.io
Commit Date: 8 Jan 2015 14:29:31 GMT+1

Commit: aae5bfd07c0c9407453a7c38f27785e648b2724d [aae5bfd]
Author: Alexander Rukletsov a...@mesosphere.io
Committer: Niklas Q. Nielsen nik...@mesosphere.io
Commit Date: 8 Jan 2015 14:33:12 GMT+1

Commit: f2cf562900195455e4e7fb8a6163b33a6b8aa12d [f2cf562]
Author: Alexander Rukletsov a...@mesosphere.io
Committer: Niklas Q. Nielsen nik...@mesosphere.io
Commit Date: 8 Jan 2015 14:35:52 GMT+1

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2014-11-19 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218198#comment-14218198
 ] 

Alexander Rukletsov commented on MESOS-1571:


https://reviews.apache.org/r/28063/
https://reviews.apache.org/r/28065/
https://reviews.apache.org/r/28069/

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2014-10-13 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169982#comment-14169982
 ] 

Alexander Rukletsov commented on MESOS-1571:


In the current review request we pass the timeout value via containerizers. 
However, in order to implement 
[https://issues.apache.org/jira/browse/MESOS-1773], a field in the 
{{CommandInfo}} protobuf is needed. I would suggest to use this field for the 
default value as well and therefore avoid changing containarizers' code. This 
can work as follows: in {{Slave::runTask()}} if the task doesn't have the field 
{{grace_period}} set, the slave sets it to the default; in the 
{{executorEnvironment()}} preparation function we extract the {{grace_period}} 
and set the corresponding environment variable. The review request will follow.

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2014-09-30 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153209#comment-14153209
 ] 

Alexander Rukletsov commented on MESOS-1571:


https://reviews.apache.org/r/25434/

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2014-09-05 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14122858#comment-14122858
 ] 

Till Toenshoff commented on MESOS-1571:
---

Using the environment to pass that info seems to fit best when looking at the 
things we already pass (e.g. {{MESOS_RECOVERY_TIMEOUT}}), whereas the 
{{SlaveInfo}} protobuf is rather limited in additional execution specific 
parameters.

However to me this still raises the question on why we prefer using the 
environment instead of proto's for such information. One obvious reason 
certainly is that we might need to supply information that is needed 
immediately before or after starting the {{ExecutorProcess}} but definitely 
before it successfully registered, when {{SlaveInfo}} finally becomes available 
to him. Despite my above argument being we already do it that way, are there 
better arguments for not adding things to the proto but instead using the 
environment for passing the additional parameters?

[~benjaminhindman], [~idownes], [~tnachen] any input for this discussion?


 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2014-08-28 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114119#comment-14114119
 ] 

Till Toenshoff commented on MESOS-1571:
---

[~nnielsen] Aye!

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2014-08-26 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111816#comment-14111816
 ] 

Niklas Quarfot Nielsen commented on MESOS-1571:
---

[~tillt] Would you be up for shepherding this change?

How about having EXECUTOR_SHUTDOWN_TIMEOUT as an upper limit for the per-task 
configurable timeout?

I think we need to differentiate between two scenarios:
1) killTask() is called. In the command executor, this just calls its own 
shutdown() and _only_ the escalation in src/launcher/executor.cpp takes effect.

{code}

   

   
  SlaveExec  CommandExecutor
   

   
+   +   +   
   
 killTask() |   |   |   
   
  +-   |   |   
   
|   killTask()  |   |   
   
+---   |   
   
|   |   killTask()  |   
   
|   +---   
   
|   |   |   
   
|   |   +---+   
   
|   |   |   |   
   
|   |   |   |   
   
|   |   ---+   
   
|   |   | shutdown()
   
|   |   | ^ 
   
|   |   | | 
   
|   |   | | 
EXECUTOR_SIGNAL_ESCALATION_TIMEOUT 
|   |   | | 
   
|   |   | v 
   
|   |   | escalated()   
   
v   v   v   
   
{code}

2) The executor is shutdown due to frameworkShutdown. shutdown() is called in 
src/exec/exec.cpp which in turn calls shutdown on the underlying executor 
implementation. That is where we have the nested timeout including an 
escalation within the slave (executor_shutdown_grace_period) which calls 
containerizer-destroy()

{code}
SlaveExec  CommandExecutor  

  +   +   + 
  |   |   | 
  |   |   | 
  |   shutdown()  |   | 
  +-^-   | 
  | | |   shutdown()  | 
  | | +-^- shutdown()  
  | | | | | ^   
  | | | | | |   
  | flags.| SHUTDOWN_ | | EXECUTOR_SIGNAL_ESCALATION_TIMEOUT
  | shutdown_ | GRACE_PERIOD  | |   
  | grace_period  | | | v   
  | | | | | escalated() 
  | | | v | 
  | | | ShutdownProcess 
  | | | kill()| 
  | v |   | 
  | shutdownExecutorTimeout() | 
  |   |   | 
  v   v   v 


[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2014-08-25 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109392#comment-14109392
 ] 

Alexander Rukletsov commented on MESOS-1571:


So we have shutdown timeout on three levels: slave, basic executor (via 
ExecutorProcess) and optionally concrete executor (e.g. CommandExecutor). I 
would suggest we leave one configurable parameter—EXECUTOR_SHUTDOWN_TIMEOUT—on 
the basic executor level and calculate two other using fixed deltas. This 
parameter can be set via slave cmd parameters and overridden via protobuf 
message (TaskInfo?). Any other thoughts?

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2014-08-22 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106695#comment-14106695
 ] 

Alexander Rukletsov commented on MESOS-1571:


Currently there are two parameters that control graceful shutdown timeout: 
EXECUTOR_SHUTDOWN_GRACE_PERIOD and EXECUTOR_SIGNAL_ESCALATION_TIMEOUT. The 
simplified event chain looks like this:
  1) Slave sends a ShutdownExecutorMessage to executor
  2) Executor tries to finish by sending SIGTERM to the process
  3) If the process did not terminate after EXECUTOR_SIGNAL_ESCALATION_TIMEOUT, 
executor sends SIGKILL to the process
  4) If the executor did not terminate after EXECUTOR_SHUTDOWN_GRACE_PERIOD, 
slave destroys the appropriate containerizer.

My thoughts are:
  * The timeouts correlate significantly, that means setting them separately is 
error-prone. Currently EXECUTOR_SHUTDOWN_GRACE_PERIOD may be configured. I 
would propose setting one of them and calculate the other using some 
[hard-coded?] delta.
  * Since we would like to control the timeout not per slave, but per task or 
framework, it looks like EXECUTOR_SIGNAL_ESCALATION_TIMEOUT should be 
configurable.
  * Do we want to tie the timeout per each task? Or passing it along with 
ExecutorInfo or FrameworkInfo will suffice?

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1571) Signal escalation timeout is not configurable

2014-08-22 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107013#comment-14107013
 ] 

Niklas Quarfot Nielsen commented on MESOS-1571:
---

I can help you out - think [~tstclair] could be great to shepherd this too

 Signal escalation timeout is not configurable
 -

 Key: MESOS-1571
 URL: https://issues.apache.org/jira/browse/MESOS-1571
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen
Assignee: Alexander Rukletsov

 Even though the executor shutdown grace period is set to a larger interval, 
 the signal escalation timeout will still be 3 seconds. It should either be 
 configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)