Re: [Soekris] "Lockup" update

Bill Maas Sun, 01 Apr 2007 09:56:52 -0700

Hi,

On Fri, 2007-03-30 at 18:53 +0200, Iustin Pop wrote:
> On Fri, Mar 30, 2007 at 11:07:54AM +0200, Bill Maas wrote:
> > If I understood the OpenBSD manual well, all the watchdog does is
> > determine (by nature) that job scheduling fails. Which indicates that
> > the kernel is in some erratic state, which in turn suggests that it's
> > operation in general should be regarded as pretty UNDEFINED.
> 
> Well, that is the case when you use a software watchdog only (a 'fake'
> one). The soekris boxes have hardware watchdogs, which if don't receive
> a signal from the kernel, should reboot the machine in the configured
> time. That is, if the hardware is in a sane state.
>


I should have paid attention to the board specs. It does shed some light
on the Openbsd watchdog(1) manual, which describes the
kern.watchdog.auto sysctl. If set, the kernel itself maintains the
counter. Wouldn't make much sense with an in-kernel counter.

If the sysctl is cleared, an external program is supposed to maintain
the watchdog counter. As the manual describes it: "In situations where
the machine provides vital services which are not handled completely in
kernel space, e.g. mail exchange, it may be desirable to reboot the
machine if process scheduling fails."

I quite frankly have no idea under which circumstances process
scheduling would fail under a kernel which is otherwise still running.


There still is the other issue, that of the watchdog reset during boot.
It happens on an OpenBSD box, maybe on Linux boxes too, if the timeout
value is set too low. The appropriate lines from /etc/rc (4.0 unpatched)
are:

243:sysctl_conf

721:if [ X"${watchdogd_flags}" != X"NO" -a -x /usr/sbin/watchdogd ];
then
722:    echo -n ' watchdogd';   /usr/sbin/watchdogd ${watchdogd_flags}
723:fi

/etc/sysctl.conf would contain something like:
kern.watchdog.auto=0
kern.watchdog.period=30         # default value

Apparently, when the vars from sysctl.conf are set, the watchdog timer
is enabled, i.e. started, immediately. The time between setting the
sysctls and the watchdogd daemon being up & running varies with the
number of services started at boot time and of course the hardware used.
On my slow net4521 test box, only logger, named, pf, ntpd, sendmail,
httpd, inetd, and ssh brought up, which can count for a bare minimum,
and after those of course watchdogd itself.

The average time between setting the sysctls and having watchdogd up
and running is about 28 seconds. That's dangerously close to the OpenBSD
default of 30s! I've tried it with a timeout of 10s, and see, the
system spontaneously reboots.

Is this a bug? Does this require a solution/workaround or would it just
be nice towards Joe and Mary User to mention this in the manual? It
does limit watchdogd's flexibility. Any ideas?

Bill




_______________________________________________
Soekris-tech mailing list
[email protected]
http://lists.soekris.com/mailman/listinfo/soekris-tech

Re: [Soekris] "Lockup" update

Reply via email to