> On May 5, 2016, 8:25 p.m., Ben Mahler wrote:
> > src/launcher/http_command_executor.cpp, lines 622-625
> > <https://reviews.apache.org/r/46325/diff/5/?file=1373555#file1373555line622>
> >
> >     The second bullet isn't really a limitation here right?
> >     
> >     Since we track when the grace period started we're now just adjusting 
> > the end of the grace period to reflect the newly desired duration, so we 
> > can easily support shrinking and growing.

Re-phrased.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46325/#review131890
-----------------------------------------------------------


On May 5, 2016, 3:39 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46325/
> -----------------------------------------------------------
> 
> (Updated May 5, 2016, 3:39 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Qian Zhang.
> 
> 
> Bugs: MESOS-4908
>     https://issues.apache.org/jira/browse/MESOS-4908
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Kill policy can be provided in a kill event. In this case it should
> take precedence over kill policy specified when the task was launched.
> When kill event is issued multiple times during the task termination,
> the signal escalation timeout (the time a task has between SIGTERM
> and SIGKILL) may be reduced.
> 
> Since updating the delay timer (we use it for signal escalation delay)
> is currently not possible, we cancel the existing signal timer and set
> up a new one. `Clock::cancel()` guarantees that, if existed, the timer
> is removed before the function returns; hence we do not set up more
> than 1 timer for signal escalation delay.
> 
> 
> Diffs
> -----
> 
>   src/launcher/http_command_executor.cpp 
> d2f15b0447d91f3a4cd92f07114cb366647cc7d3 
> 
> Diff: https://reviews.apache.org/r/46325/diff/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Tested manually using modified `mesos-execute` in a way that **two** extra 
> kill task requests are sent, 2s and 3s after receiving `TASK_RUNNING`. Each 
> kill task request specifies `KillPolicy` with 1s grace period. Together with 
> a kill *without* a kill policy scheduled 1s after the task is being launched, 
> the task receives **three** kill requests in total.
> 
> Master: `./bin/mesos-master.sh --work_dir=/tmp/w/m --ip=127.0.0.1`
> Agent: `./bin/mesos-slave.sh --work_dir=/tmp/w/s --master=127.0.0.1:5050 
> --http_command_executor`
> Mesos-execute: `./src/mesos-execute --master=127.0.0.1:5050 --name=test 
> --command="/Users/alex/bin/unresponsive_process" --env='{"GLOG_v": "2"}' 
> --kill_after=1secs`
> 
> HTTP command executor log:
> ```
> Received SUBSCRIBED event
> Subscribed executor on alexr.fritz.box
> Received LAUNCH event
> Starting task test
> sh -c '/Users/alex/bin/unresponsive_process'
> Forked command at 17475
> 14455919081943275466
> Received ACKNOWLEDGED event
> 17172602460659762152
> Received KILL event
> Received kill for task test
> Sending SIGTERM to process tree at pid 17475
> Sent SIGTERM to the following process trees:
> [ 
> --- 17475 /Users/alex/bin/unresponsive_process
> ]
> 4381544758593790168
> Scheduling escalation to SIGKILL in 3secs from now
> Received ACKNOWLEDGED event
> Received KILL event
> Received kill for task test
> Rescheduling escalation to SIGKILL in 1secs from now
> 10370891801885978953
> Process 17475 did not terminate after 1secs, sending SIGKILL to process tree 
> at 17475
> Killed the following process trees:
> [ 
> --- 17475 /Users/alex/bin/unresponsive_process
> ]
> Received KILL event
> Received kill for task test
> Command terminated with signal Killed: 9 (pid: 17475)
> ```
> 
> Excerpt from the agent log that shows all 3 kill task requests and that the 
> segnal escalation timeout was reduced from 3s to 1s:
> ```
> I0418 14:27:17.825070 244285440 slave.cpp:3605] Forwarding the update 
> TASK_RUNNING (UUID: 925e2d89-f6eb-464d-9a50-a74a8e07bc88) for task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000 to master@127.0.0.1:5050
> I0418 14:27:17.831233 242139136 status_update_manager.cpp:392] Received 
> status update acknowledgement (UUID: 925e2d89-f6eb-464d-9a50-a74a8e07bc88) 
> for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.834309 244285440 slave.cpp:2046] Asked to kill task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.842150 244285440 http.cpp:178] HTTP POST for 
> /slave(1)/api/v1/executor from 192.168.178.24:54206
> I0418 14:27:18.842331 244285440 slave.cpp:3207] Handling status update 
> TASK_KILLING (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.843214 242139136 status_update_manager.cpp:320] Received 
> status update TASK_KILLING (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for 
> task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.843387 243748864 slave.cpp:3605] Forwarding the update 
> TASK_KILLING (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000 to master@127.0.0.1:5050
> I0418 14:27:18.846459 242675712 status_update_manager.cpp:392] Received 
> status update acknowledgement (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) 
> for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:19.836699 240529408 slave.cpp:2046] Asked to kill task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.850658 240529408 slave.cpp:2046] Asked to kill task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.927338 241602560 http.cpp:178] HTTP POST for 
> /slave(1)/api/v1/executor from 192.168.178.24:54206
> I0418 14:27:20.927465 241602560 slave.cpp:3207] Handling status update 
> TASK_KILLED (UUID: fdcb43ec-271b-4976-aaf3-59a031cdc924) for task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.928771 240529408 status_update_manager.cpp:320] Received 
> status update TASK_KILLED (UUID: fdcb43ec-271b-4976-aaf3-59a031cdc924) for 
> task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.928933 243748864 slave.cpp:3605] Forwarding the update 
> TASK_KILLED (UUID: fdcb43ec-271b-4976-aaf3-59a031cdc924) for task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000 to master@127.0.0.1:5050
> 
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Reply via email to