Hello all, I've got a new small matter for generic discussion:
Some of my systems have hardware watchdogs, either on motherboards features or in IPMI addons. A small intro for newcomers: Watchdogs include a timer that can be started by BIOS or a watchdog driver in the OS, and the driver should regularly restart this timer (a trivial driver is just a loop with long sleeps and a write of a byte into a certain port's address). If the computer freezes and the driver no longer functions, the hardware watchdog issues a reset on the motherboard, or a similar administrative action (if configurable). Now, in OpenSolaris (and portable to OI) there is a bmc-watchdog package for some proprietary hardware implementations, as well as a newly ported open-source driver is brewing. There is also an SMF service to wrap the watchdog. From what I see, upon service startup the HW timer is started, and for the duration of the service uptime the timer-resets are regularly issued. Upon service shutdown there are two possible approaches (as I tweaked the method script a lot, I am not sure what was there originally): either the daemon for regularly-restarting the timer is just killed (and the timer keeps ticking), or the timer is also stopped. On my system it happened to be the former, and during a shutdown which took longer than usual to proceed (for valid reasons), the box got reset by the timer. Now I looked at the SMF manifest, and see that the service only depends on filesystem/usr. In my practice this meant that upon OS shutdown, the bmc-watchdog daemon was quickly killed (as nothing depends on this service) and the timer ticked down to zero - boom! Question is: what is a valid way to avoid the watchdog killing the system upon lengthy shutdowns? I came up with a few ideas: 1) Redefine the stop method to not kill the daemon - not good, for pedantic reasons at least ;) 2) Redefine the stop method to kill the daemon and stop the timer - not good because the box can potentially also freeze during shutdown, and in that case we would want it automagically reset; 3) Make milestone/single-user a dependency of bmc-watchdog (I also tried to redefine bmc-watchdog instance to have a dependent - but this did not get picked up properly) - in this case the daemon works until all heavy services in miletone/multi-user get shut down properly, and only gets killed then (and the HW timer ticks for a few more seconds, until the system is rebooted). So far I like the idea#3 (alone) best. Are there any reasons not to do so, or to do something different? Ultimately, I hope, the best method should end up in the illumos-gate ;) Thanks, //Jim Klimov _______________________________________________ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss