----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50540/#review145876 -----------------------------------------------------------
High level question: is this useful? I imagine watchdog logic for agent is to call sd_notify from `Slave` actor (similar to Master) so that we know `Slave` is still functional. I think this is the right way to detect hanging. The systemd library code should provide primitive to the Agent code so that it does not call raw `sd_notify` directly. configure.ac (lines 615 - 616) <https://reviews.apache.org/r/50540/#comment212242> Looks like you haven't checked headers. You need dev headers, right? src/linux/systemd.hpp (line 32) <https://reviews.apache.org/r/50540/#comment212243> We now does not use `using namespace` any more. Please include them one by one using `using xxx;` clauses. src/linux/systemd.cpp (line 203) <https://reviews.apache.org/r/50540/#comment212295> Hum, this also could be the case where WATCHDOG_USEC is not set, right? The error here sounds unintuitive. I'd suggest we check env exists or not, and then parse it. ``` if (watchdogUsec.isNone()) { } else if (watchDog.isError()) { } ``` src/linux/systemd.cpp (line 207) <https://reviews.apache.org/r/50540/#comment212297> Who will delete the Watchdog? I think you should use a `managed` spawn. See `spawn` definition. src/linux/systemd.cpp (line 208) <https://reviews.apache.org/r/50540/#comment212298> Why create the watch dog actor if WATCHDOG_USEC is not set? - Jie Yu On Aug. 15, 2016, 6:40 p.m., Lawrence Wu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50540/ > ----------------------------------------------------------- > > (Updated Aug. 15, 2016, 6:40 p.m.) > > > Review request for mesos, David Robinson, Ian Downes, and Jie Yu. > > > Bugs: MESOS-5376 > https://issues.apache.org/jira/browse/MESOS-5376 > > > Repository: mesos > > > Description > ------- > > Add systemd watchdog support. > > > Diffs > ----- > > configure.ac d2136909b7305498ae901a5ea00133142b77f9e6 > src/Makefile.am 599ebbef6d164fb2a530b55427ddabb5cd607634 > src/linux/systemd.hpp 91134f1d4b100759e45931bd09ca4e1e1aeaaf8a > src/linux/systemd.cpp 619aa2778da5f99d3a078a8e1208bdaa9dc77581 > src/slave/main.cpp 4624392d30cf391015dcd63f447fe2414a47a16a > src/tests/linux/systemd_test_helper.hpp PRE-CREATION > src/tests/linux/systemd_test_helper.cpp PRE-CREATION > src/tests/linux/systemd_test_helper_main.cpp PRE-CREATION > src/tests/linux/systemd_tests.cpp PRE-CREATION > > Diff: https://reviews.apache.org/r/50540/diff/ > > > Testing > ------- > > Tested by sending SIGSTOP to running mesos and verifying via journalctl that > it was killed by the watchdog. > > The test I wrote for this does the following: > - build up a unit file as a string and create a unit file in > /etc/systemd/system/systemd-test-helper.service > - reload the systemd daemon and start the newly discovered helper service > - wait a bit (30s) to make sure the watchdog has had a chance to kill the > service > - use systemctl status systemd-test-helper to check that the service is still > running > - clean up the unit file. > > TODO: create a similar test, but send a SIGSTOP to the service and ensure > that it has been killed by watchdog. > > > Thanks, > > Lawrence Wu > >
