-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50540/#review145876
-----------------------------------------------------------



High level question: is this useful? I imagine watchdog logic for agent is to 
call sd_notify from `Slave` actor (similar to Master) so that we know `Slave` 
is still functional. I think this is the right way to detect hanging.

The systemd library code should provide primitive to the Agent code so that it 
does not call raw `sd_notify` directly.


configure.ac (lines 615 - 616)
<https://reviews.apache.org/r/50540/#comment212242>

    Looks like you haven't checked headers. You need dev headers, right?



src/linux/systemd.hpp (line 32)
<https://reviews.apache.org/r/50540/#comment212243>

    We now does not use `using namespace` any more. Please include them one by 
one using `using xxx;` clauses.



src/linux/systemd.cpp (line 203)
<https://reviews.apache.org/r/50540/#comment212295>

    Hum, this also could be the case where WATCHDOG_USEC is not set, right? The 
error here sounds unintuitive.
    
    I'd suggest we check env exists or not, and then parse it.
    
    ```
    if (watchdogUsec.isNone()) {
    } else if (watchDog.isError()) {
    }
    ```



src/linux/systemd.cpp (line 207)
<https://reviews.apache.org/r/50540/#comment212297>

    Who will delete the Watchdog? I think you should use a `managed` spawn. See 
`spawn` definition.



src/linux/systemd.cpp (line 208)
<https://reviews.apache.org/r/50540/#comment212298>

    Why create the watch dog actor if WATCHDOG_USEC is not set?


- Jie Yu


On Aug. 15, 2016, 6:40 p.m., Lawrence Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50540/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2016, 6:40 p.m.)
> 
> 
> Review request for mesos, David Robinson, Ian Downes, and Jie Yu.
> 
> 
> Bugs: MESOS-5376
>     https://issues.apache.org/jira/browse/MESOS-5376
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Add systemd watchdog support.
> 
> 
> Diffs
> -----
> 
>   configure.ac d2136909b7305498ae901a5ea00133142b77f9e6 
>   src/Makefile.am 599ebbef6d164fb2a530b55427ddabb5cd607634 
>   src/linux/systemd.hpp 91134f1d4b100759e45931bd09ca4e1e1aeaaf8a 
>   src/linux/systemd.cpp 619aa2778da5f99d3a078a8e1208bdaa9dc77581 
>   src/slave/main.cpp 4624392d30cf391015dcd63f447fe2414a47a16a 
>   src/tests/linux/systemd_test_helper.hpp PRE-CREATION 
>   src/tests/linux/systemd_test_helper.cpp PRE-CREATION 
>   src/tests/linux/systemd_test_helper_main.cpp PRE-CREATION 
>   src/tests/linux/systemd_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50540/diff/
> 
> 
> Testing
> -------
> 
> Tested by sending SIGSTOP to running mesos and verifying via journalctl that 
> it was killed by the watchdog.
> 
> The test I wrote for this does the following:
> - build up a unit file as a string and create a unit file in 
> /etc/systemd/system/systemd-test-helper.service
> - reload the systemd daemon and start the newly discovered helper service
> - wait a bit (30s) to make sure the watchdog has had a chance to kill the 
> service
> - use systemctl status systemd-test-helper to check that the service is still 
> running
> - clean up the unit file.
> 
> TODO: create a similar test, but send a SIGSTOP to the service and ensure 
> that it has been killed by watchdog.
> 
> 
> Thanks,
> 
> Lawrence Wu
> 
>

Reply via email to