Hi, I've been playing with some ideas on how to add watchdog support to systemd. I don't like talking about vaporware so here are some patches with a prototype implementation. It should give you an idea on how this could be done.
A few words on the ideas behind this: When working with a watchdog in Linux, the typical scenario is one hardware watchdog, but multiple processes that should be monitored. Beyond that the hardware watchdog should be the last line of defence. A more graceful recovery should be tried first. How to implement this is systemd: systemd already has the concept of a state for each service and a very simple method (sd_notify) for the service to provide status information to systemd. This is implemented in the first patch. A service can send keep-alive messages with sd_notify, and the timestamp of the latest message is exposed as a service property. The second patch implements service restart / reboot when no keep-alive message was received for a certain amount of time. Note: This only triggers if at least one keep-alive was received. I don't think anything can be done if a service fails to start. This should be handled outside of systemd. I think, the watchdog hardware should be handled in a separate service, for several reasons: - It's not useful on systems without watchdog hardware. This gives us a clean way to disable it. - This is a rather critical part to implement. The code is much simpler this way. - There are many different requirements and options on how to handle the watchdog hardware. It's a lot easier to replace a separate daemon with a custom implementation, should it be necessary. The third patch is helper code. It provides a single time stamp for when systemd will reboot if no more keep-alive are sent. This way the watchdog service only needs to make one D-Bus call to get the necessary data. The last patch adds a simple daemon that handled the watchdog device. What do you think? Regards, Michael Michael Olbrich (4): WIP: service: add watchdog timestamp WIP: service: add watchdog restart/reboot timeouts WIP: manager: add a global watchdog reboot timestamp WIP: add basic watchdog daemon Makefile.am | 21 ++++++- src/99-systemd.rules.in | 2 + src/dbus-manager.c | 4 + src/dbus-service.c | 8 +++ src/load-fragment-gperf.gperf.m4 | 2 + src/manager.c | 20 ++++++ src/manager.h | 3 + src/service.c | 49 +++++++++++++++ src/service.h | 6 ++ src/watchdogd.c | 119 ++++++++++++++++++++++++++++++++++++ units/systemd-watchdogd.service.in | 16 +++++ 11 files changed, 248 insertions(+), 2 deletions(-) create mode 100644 src/watchdogd.c create mode 100644 units/systemd-watchdogd.service.in -- 1.7.5.4 _______________________________________________ systemd-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/systemd-devel
