> On April 21, 2016, 9:56 p.m., Ben Mahler wrote:
> > src/launcher/http_command_executor.cpp, lines 172-176
> > <https://reviews.apache.org/r/46325/diff/3/?file=1355102#file1355102line172>
> >
> >     Wrap at the paren:
> >     
> >     ```
> >               kill(event.kill().task_id(),
> >                    event.kill().has_kill_policy()
> >                      ? Option<KillPolicy>(event.kill().kill_policy())
> >                      : None());
> >     ```
> >     
> >     Any way to avoid the explicit Option construction? Does this compile?
> >     
> >     ```
> >               kill(event.kill().task_id(),
> >                    event.kill().has_kill_policy()
> >                     ? Some(event.kill().kill_policy())
> >                     : None());
> >     ```
> >     
> >     If not, consider pulling out the kill policy as an option variable.

Unfortunately, `Some(event.kill().kill_policy())` does not compile. Passing 
`Event::Kill` seems unfortunate as well: if we want to call `kill()` from 
another place in the code, we have to prepare the proto message. I'll go with a 
local variable.


> On April 21, 2016, 9:56 p.m., Ben Mahler wrote:
> > src/launcher/http_command_executor.cpp, line 665
> > <https://reviews.apache.org/r/46325/diff/3/?file=1355102#file1355102line665>
> >
> >     Leaving this unimplemented as a TODO seems to have implications on the 
> > documentation in your previous patch:
> >     
> >     https://reviews.apache.org/r/46322/
> >     
> >     It seems like all you would need to do to implement the TODO is to 
> > remove this if condition and make the logging more clear? Otherwise, if you 
> > leave this out we really need to document it within the Kill Call. My hunch 
> > is that in writing that documentation we'll realize we should just do an 
> > override always.

Yeah, we can remove the check.


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46325/#review129937
-----------------------------------------------------------


On April 21, 2016, 2:28 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46325/
> -----------------------------------------------------------
> 
> (Updated April 21, 2016, 2:28 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Qian Zhang.
> 
> 
> Bugs: MESOS-4908
>     https://issues.apache.org/jira/browse/MESOS-4908
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Kill policy can be provided in a kill event. In this case it should
> take precedence over kill policy specified when the task was launched.
> When kill event is issued multiple times during the task termination,
> the signal escalation timeout (the time a task has between SIGTERM
> and SIGKILL) may be reduced.
> 
> Since updating the delay timer (we use it for signal escalation delay)
> is currently not possible, we cancel the existing signal timer and set
> up a new one. `Clock::cancel()` guarantees that, if existed, the timer
> is removed before the function returns; hence we do not set up more
> than 1 timer for signal escalation delay.
> 
> 
> Diffs
> -----
> 
>   src/launcher/http_command_executor.cpp 
> 9dfe48108cababeb2d4a6af74434880d79245c21 
> 
> Diff: https://reviews.apache.org/r/46325/diff/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Tested manually using modified `mesos-execute` in a way that **two** extra 
> kill task requests are sent, 2s and 3s after receiving `TASK_RUNNING`. Each 
> kill task request specifies `KillPolicy` with 1s grace period. Together with 
> a kill *without* a kill policy scheduled 1s after the task is being launched, 
> the task receives **three** kill requests in total.
> 
> Master: `./bin/mesos-master.sh --work_dir=/tmp/w/m --ip=127.0.0.1`
> Agent: `./bin/mesos-slave.sh --work_dir=/tmp/w/s --master=127.0.0.1:5050 
> --http_command_executor`
> Mesos-execute: `./src/mesos-execute --master=127.0.0.1:5050 --name=test 
> --command="/Users/alex/bin/unresponsive_process" --env='{"GLOG_v": "2"}' 
> --kill_after=1secs`
> 
> HTTP command executor log:
> ```
> Received SUBSCRIBED event
> Subscribed executor on alexr.fritz.box
> Received LAUNCH event
> Starting task test
> sh -c '/Users/alex/bin/unresponsive_process'
> Forked command at 17475
> 14455919081943275466
> Received ACKNOWLEDGED event
> 17172602460659762152
> Received KILL event
> Received kill for task test
> Sending SIGTERM to process tree at pid 17475
> Sent SIGTERM to the following process trees:
> [ 
> --- 17475 /Users/alex/bin/unresponsive_process
> ]
> 4381544758593790168
> Scheduling escalation to SIGKILL in 3secs from now
> Received ACKNOWLEDGED event
> Received KILL event
> Received kill for task test
> Rescheduling escalation to SIGKILL in 1secs from now
> 10370891801885978953
> Process 17475 did not terminate after 1secs, sending SIGKILL to process tree 
> at 17475
> Killed the following process trees:
> [ 
> --- 17475 /Users/alex/bin/unresponsive_process
> ]
> Received KILL event
> Received kill for task test
> Command terminated with signal Killed: 9 (pid: 17475)
> ```
> 
> Excerpt from the agent log that shows all 3 kill task requests and that the 
> segnal escalation timeout was reduced from 3s to 1s:
> ```
> I0418 14:27:17.825070 244285440 slave.cpp:3605] Forwarding the update 
> TASK_RUNNING (UUID: 925e2d89-f6eb-464d-9a50-a74a8e07bc88) for task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000 to master@127.0.0.1:5050
> I0418 14:27:17.831233 242139136 status_update_manager.cpp:392] Received 
> status update acknowledgement (UUID: 925e2d89-f6eb-464d-9a50-a74a8e07bc88) 
> for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.834309 244285440 slave.cpp:2046] Asked to kill task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.842150 244285440 http.cpp:178] HTTP POST for 
> /slave(1)/api/v1/executor from 192.168.178.24:54206
> I0418 14:27:18.842331 244285440 slave.cpp:3207] Handling status update 
> TASK_KILLING (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.843214 242139136 status_update_manager.cpp:320] Received 
> status update TASK_KILLING (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for 
> task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:18.843387 243748864 slave.cpp:3605] Forwarding the update 
> TASK_KILLING (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) for task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000 to master@127.0.0.1:5050
> I0418 14:27:18.846459 242675712 status_update_manager.cpp:392] Received 
> status update acknowledgement (UUID: a2f6eca7-b3e5-4e45-adcb-356f75355563) 
> for task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:19.836699 240529408 slave.cpp:2046] Asked to kill task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.850658 240529408 slave.cpp:2046] Asked to kill task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.927338 241602560 http.cpp:178] HTTP POST for 
> /slave(1)/api/v1/executor from 192.168.178.24:54206
> I0418 14:27:20.927465 241602560 slave.cpp:3207] Handling status update 
> TASK_KILLED (UUID: fdcb43ec-271b-4976-aaf3-59a031cdc924) for task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.928771 240529408 status_update_manager.cpp:320] Received 
> status update TASK_KILLED (UUID: fdcb43ec-271b-4976-aaf3-59a031cdc924) for 
> task test of framework ab374773-a018-4531-923b-899cf1e4f573-0000
> I0418 14:27:20.928933 243748864 slave.cpp:3605] Forwarding the update 
> TASK_KILLED (UUID: fdcb43ec-271b-4976-aaf3-59a031cdc924) for task test of 
> framework ab374773-a018-4531-923b-899cf1e4f573-0000 to master@127.0.0.1:5050
> 
> ```
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Reply via email to