[Soekris] Test the watchdog

T Sharpe Sun, 01 Feb 2009 12:02:45 -0800

I've used the 4501 watchdog as follows:

My system uses a 2.4.19 kernel is complied with the watchdog driver (wdtsc), 
after applying a hard-to-find patch to fix a hardware bug (it's old, but I can 
forward it if you wish).

I created a node at major 10, minor 130 that appears as /dev/watchdog.

I wrote a daemon in Python that performs a data gathering function and deals 
with the watchdog.  Here's what was required:

  -To start the watchdog, I opened /dev/watchdog as a file with write access.  
It's important to only open it once - bad things happen the second time it's 
opened.
  -To keep the watchdog from rebooting the system, you "ping" the watchdog by 
writing a character (any character except the "stop" character below) to it 
every second or so.  As I recall it takes 2-3 seconds for it to decide to 
reboot the system, but I'm not sure.
  -To stop the watchdog, write the "stop" character (I think is was "V"), then 
flush and close the file open.

I included the watchdog because I'm not a good programmer, but so far it's only 
been triggered by hardware problems.  I syslog my 4501 to another machine and 
I've seen a few cases where the drive controller errors out, casuing problems 
until it affects my daemon, then the watchdog rebooted the system and corrected 
the problem.

Hope it helps!

Tim Sharpe

-----------------------
Thorsten Mühlfelder wrote:

Hmm, I guess you're talking about a software based solution. But the the 
Soekris boards have a hardware watchdog, which perhaps works the other way 
round:
Kernel talks with it's driver to the hardware. If the kernel freezes the 
hardware chip resets the machine because it doesn't get "pings" by the kernel 
anymore.

This would be more logical to me, but I really don't know ;-)

On Wed, 14 Jan 2009 18:08:14 -0500
"Michael Proto" <mike at jellydonut.org> wrote:

> I'm no authority on the subject (I've learned most of what I know simply by
> lurking this list and reading PHK's posts), but there are two pieces to
> watchdog-- the kernel-side piece and a userland daemon that acts as a dead
> man's switch. The userland watchdogd (or whatever linux's equivalent)
> enables the kernel bits and sends a "I'm still running" message to the
> kernel every few seconds. If the kernel doesn't get that message, it assumes
> userland is broken and institutes a reboot. This is why you can test with a
> kill -9 to the watchdogd program-- it doesn't have time to deactivate the
> in-kernel watchdog component before it dies.
> 
> 
> -Proto

_______________________________________________
Soekris-tech mailing list
[email protected]
http://lists.soekris.com/mailman/listinfo/soekris-tech

[Soekris] Test the watchdog

Reply via email to