Re: What is correct way to enable watchdog?

2009-02-24 Thread Peter Steele
No, meaning, if a system is unresponsive for 300 seconds, action will be 
taken. watchdogd will not prevent proper reboots, panics or power failures. 

Bad wording on my part. What you said is what I meant, and I assume the default 
action is to reboot the system? 

Panic, or overheating. Check the dumpdev/dumpdir variables in rc.conf(5). 

We don't have dumpdev/dumpdir configured in rc.conf. I'll do that. What makes 
us suspicious is that we have been running this stress test on systems for 
months without any reboots. We then enable the 300 second watchdog and two 
systems spontaneously reboot. We've turned it off again and have restarted the 
stress test and so far no reboots. What we want to know is are these reboots 
occurring as a result of a watchdog reboot? Is any kind of system log created 
when the watchdog reboots a system? 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: What is correct way to enable watchdog?

2009-02-24 Thread Peter Steele

Panic, or overheating. Check the dumpdev/dumpdir variables in rc.conf(5). 

BTW, what's the difference between setting kern.corefile in /etc/sysctl and 
these dumpdev/dumpdir variables? 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: What is correct way to enable watchdog?

2009-02-24 Thread Mel
On Tuesday 24 February 2009 08:12:11 Peter Steele wrote:
 Panic, or overheating. Check the dumpdev/dumpdir variables in rc.conf(5).

 BTW, what's the difference between setting kern.corefile in /etc/sysctl and
 these dumpdev/dumpdir variables?

They are two different things.
kern.corefile is used for userland processes. See core(5) for details. Handy 
if you want to collect all coredumps for review or when large parts of the 
system are read-only.

-- 
Mel

Problem with today's modular software: they start with the modules
and never get to the software part.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: What is correct way to enable watchdog?

2009-02-24 Thread Mel
On Tuesday 24 February 2009 05:25:36 Peter Steele wrote:
 No, meaning, if a system is unresponsive for 300 seconds, action will be
 taken. watchdogd will not prevent proper reboots, panics or power
  failures.

 Bad wording on my part. What you said is what I meant, and I assume the
 default action is to reboot the system?

If -e cmd is not specified, the daemon will
 perform a trivial file system check instead.

 Panic, or overheating. Check the dumpdev/dumpdir variables in rc.conf(5).

 We don't have dumpdev/dumpdir configured in rc.conf. I'll do that. What
 makes us suspicious is that we have been running this stress test on
 systems for months without any reboots. We then enable the 300 second
 watchdog and two systems spontaneously reboot. We've turned it off again
 and have restarted the stress test and so far no reboots. What we want to
 know is are these reboots occurring as a result of a watchdog reboot? Is
 any kind of system log created when the watchdog reboots a system?

This smells more like a bug in watchdog. If that's the case, the crash dumps 
should point right at it, at which point I'd take it to freebsd-stable 
or -current, whichever applies to the OS version.

-- 
Mel

Problem with today's modular software: they start with the modules
and never get to the software part.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: What is correct way to enable watchdog?

2009-02-24 Thread Peter Steele
 If -e cmd is not specified, the daemon will 
 perform a trivial file system check instead. 

So -e has to be provided for the system to reboot? That doesn't seem to jive 
with our experience. When we first enabled the watchdog, we just went with the 
defaults--no -e command. The default for the timeout is 16 seconds. We started 
getting reboots regularly until we increased this value. We decided we didn't 
need anything as agressive as 16 seconds and went instead with 300 seconds. We 
still see the reboots, but nowhere near as frequently. 

This smells more like a bug in watchdog. If that's the case, the crash dumps 
should point right at it, at which point I'd take it to freebsd-stable 
or -current, whichever applies to the OS version. 

Okay, we'll enable dumpdev/dumpdir and see what we get. 

With 300 seconds though, a system would have to be truly dead before a reboot 
should occur. But our own application logs show that only four minutes elapsed 
from the last log we recorded to the first log we recorded after the reboot. 
Considering it takes 2-3 minutes for a system to boot and our application to 
start running after the boot, I would think we should see a span of at least 7 
minutes in our logs where nothing is recorded. However, the span is only about 
4 minutes, which is more or less the same as we'd get if someone went by the 
box and hit the reset button. So it doesn't look like the watchdog is behaving 
properly. 


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: What is correct way to enable watchdog?

2009-02-24 Thread Patrick Lamaizière
Le Tue, 24 Feb 2009 13:01:13 -0800 (PST),
Peter Steele pste...@maxiscale.com:

  If -e cmd is not specified, the daemon will 
  perform a trivial file system check instead. 
 
 So -e has to be provided for the system to reboot?

No, if -e is provided, watchdogd execute the command 'cmd', if the
command succeed it resets and restarts the watchdog.

Without -e, watchdogd tests a stat(/etc,xxx) syscall.

See http://ezine.daemonnews.org/200406/watchdog.html

 This smells more like a bug in watchdog. If that's the case, the
 crash dumps should point right at it, at which point I'd take it to
 freebsd-stable or -current, whichever applies to the OS version. 
 
 Okay, we'll enable dumpdev/dumpdir and see what we get. 

If the watchdog is a hardware watchdog, you will not get any log or
crash dump, just a hard reset.

Which watchdog are you using?

Regards.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: What is correct way to enable watchdog?

2009-02-24 Thread Peter Steele
 Which watchdog are you using? 

We are using the default FreeBSD 7.0 watchdog. We've added the line 

watchdogd_enable=yes 

to rc.conf to enable it and have modified /etc/rc.d/watchdogd to pass -t 300 
to the daemon instead of the default 16. 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: What is correct way to enable watchdog?

2009-02-23 Thread Mel
On Monday 23 February 2009 12:11:38 Peter Steele wrote:

 We assumed this would give us a watchdog timeout of 300 seconds (5
 minutes), meaning a system would not reboot unless it is non-responsive for
 five minutes.

No, meaning, if a system is unresponsive for 300 seconds, action will be 
taken. watchdogd will not prevent proper reboots, panics or power failures.

 However, in a recent stress test we had unexplained 
 spontaneous reboots on two systems, with no logs of any kind to indicate
 why the systems rebooted.

Panic, or overheating. Check the dumpdev/dumpdir variables in rc.conf(5).
-- 
Mel

Problem with today's modular software: they start with the modules
and never get to the software part.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org