On 2013-03-28 20:12, Garrett D'Amore wrote:

On Mar 28, 2013, at 12:05 PM, Jim Klimov <[email protected]> wrote:

So, for the case of dedicated-hardware watchdogs, this is the part of
your post which I can't find as relevant: "The usual thing is to hook
this up to a system timer, which will catch hard hangs."

What I mean is that what most systems do is not express an API out to userland, 
but just have something that runs out of the timer that tickles the hardware 
watchdog register.  This guards against the hard hang of the entire 
system/scheduler, but it does nothing to ensure that some upper layer services 
are still being handled.

I see... well, whatever way a daemon is implemented (as is relevant for
Linux watchdogs, and at least legacy OpenSolaris and maybe illumos BMC
port), it does rely on a timer interrupt and software not having hung
in order to tickle the hardware watchdog's separate timer.

In Linux, I might guess, the API is expressed as /dev/watchdog node,
into where you can echo a character. The daemon is usually trivial,
reference code is part of some README and it is a dozen lines long.

In OpenSolaris, I believe, there was a single (closed-source) driver
for all supported watchdog models and the bmc-watchdog program could
talk to it. In a way it was the user-accessible API to set and query
the watchdog. In case it helps, here are a few output examples:

# bmc-watchdog -g
Timer Use:                   SMS/OS
Timer:                       Running
Logging:                     Enabled
Timeout Action:              Hard Reset
Pre-Timeout Interrupt:       None
Pre-Timeout Interval:        0 seconds
Timer Use BIOS FRB2 Flag:    Clear
Timer Use BIOS POST Flag:    Clear
Timer Use BIOS OS Load Flag: Clear
Timer Use BIOS SMS/OS Flag:  Clear
Timer Use BIOS OEM Flag:     Clear
Initial Countdown:           900 seconds
Current Countdown:           842 seconds


# bmc-watchdog -h
Usage: bmc-watchdog <COMMAND> [OPTIONS]... [COMMAND_OPTIONS]...

COMMANDS:
  -s         --set                            Set BMC Watchdog Config.
  -g         --get                            Get BMC Watchdog Config.
  -r         --reset                          Reset BMC Watchdog Timer.
  -t         --start                          Start BMC Watchdog Timer.
  -y         --stop                           Stop BMC Watchdog Timer.
  -c         --clear                          Clear BMC Watchdog Config.
  -d         --daemon                         Run in Daemon Mode.

OPTIONS:
  -D STRING  --driver-type=IPMIDRIVER             Specify IPMI driver type.
--disable-auto-probe Do not probe driver for default settings.
             --driver-address=DRIVER-ADDRESS      Specify driver address.
--driver-device=DEVICE Specify driver device path. --register-spacing=REGISTER-SPACING Specify driver register spacing. -f STRING --logfile=FILE Specify an alternate logfile --config-file=FILE Specify an alternate config file
  -n         --no-logging                         Turn off all logging
  -?         --help                               Output help menu.
  -V         --version                            Output version.
             --debug                              Turn on debugging.



Now I've not looked at Linux and how it uses watchdogs… but I've experience 
with a few different embedded systems, and the above handling is almost 
precisely what I've seen done.  NetBSD was nice because it instead offered a 
watchdog facility that extended into userland, allowing the service check to be 
done by a userland daemon, which is far more interesting than just that the 
clock interrupt handler is still working properly. :-)

_______________________________________________
oi-dev mailing list
[email protected]
http://openindiana.org/mailman/listinfo/oi-dev

Reply via email to