Re: What is correct way to enable watchdog?
No, meaning, if a system is unresponsive for 300 seconds, action will be taken. watchdogd will not prevent proper reboots, panics or power failures. Bad wording on my part. What you said is what I meant, and I assume the default action is to reboot the system? Panic, or overheating. Check the dumpdev/dumpdir variables in rc.conf(5). We don't have dumpdev/dumpdir configured in rc.conf. I'll do that. What makes us suspicious is that we have been running this stress test on systems for months without any reboots. We then enable the 300 second watchdog and two systems spontaneously reboot. We've turned it off again and have restarted the stress test and so far no reboots. What we want to know is are these reboots occurring as a result of a watchdog reboot? Is any kind of system log created when the watchdog reboots a system? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: What is correct way to enable watchdog?
Panic, or overheating. Check the dumpdev/dumpdir variables in rc.conf(5). BTW, what's the difference between setting kern.corefile in /etc/sysctl and these dumpdev/dumpdir variables? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: What is correct way to enable watchdog?
On Tuesday 24 February 2009 08:12:11 Peter Steele wrote: Panic, or overheating. Check the dumpdev/dumpdir variables in rc.conf(5). BTW, what's the difference between setting kern.corefile in /etc/sysctl and these dumpdev/dumpdir variables? They are two different things. kern.corefile is used for userland processes. See core(5) for details. Handy if you want to collect all coredumps for review or when large parts of the system are read-only. -- Mel Problem with today's modular software: they start with the modules and never get to the software part. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: What is correct way to enable watchdog?
On Tuesday 24 February 2009 05:25:36 Peter Steele wrote: No, meaning, if a system is unresponsive for 300 seconds, action will be taken. watchdogd will not prevent proper reboots, panics or power failures. Bad wording on my part. What you said is what I meant, and I assume the default action is to reboot the system? If -e cmd is not specified, the daemon will perform a trivial file system check instead. Panic, or overheating. Check the dumpdev/dumpdir variables in rc.conf(5). We don't have dumpdev/dumpdir configured in rc.conf. I'll do that. What makes us suspicious is that we have been running this stress test on systems for months without any reboots. We then enable the 300 second watchdog and two systems spontaneously reboot. We've turned it off again and have restarted the stress test and so far no reboots. What we want to know is are these reboots occurring as a result of a watchdog reboot? Is any kind of system log created when the watchdog reboots a system? This smells more like a bug in watchdog. If that's the case, the crash dumps should point right at it, at which point I'd take it to freebsd-stable or -current, whichever applies to the OS version. -- Mel Problem with today's modular software: they start with the modules and never get to the software part. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: What is correct way to enable watchdog?
If -e cmd is not specified, the daemon will perform a trivial file system check instead. So -e has to be provided for the system to reboot? That doesn't seem to jive with our experience. When we first enabled the watchdog, we just went with the defaults--no -e command. The default for the timeout is 16 seconds. We started getting reboots regularly until we increased this value. We decided we didn't need anything as agressive as 16 seconds and went instead with 300 seconds. We still see the reboots, but nowhere near as frequently. This smells more like a bug in watchdog. If that's the case, the crash dumps should point right at it, at which point I'd take it to freebsd-stable or -current, whichever applies to the OS version. Okay, we'll enable dumpdev/dumpdir and see what we get. With 300 seconds though, a system would have to be truly dead before a reboot should occur. But our own application logs show that only four minutes elapsed from the last log we recorded to the first log we recorded after the reboot. Considering it takes 2-3 minutes for a system to boot and our application to start running after the boot, I would think we should see a span of at least 7 minutes in our logs where nothing is recorded. However, the span is only about 4 minutes, which is more or less the same as we'd get if someone went by the box and hit the reset button. So it doesn't look like the watchdog is behaving properly. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: What is correct way to enable watchdog?
Le Tue, 24 Feb 2009 13:01:13 -0800 (PST), Peter Steele pste...@maxiscale.com: If -e cmd is not specified, the daemon will perform a trivial file system check instead. So -e has to be provided for the system to reboot? No, if -e is provided, watchdogd execute the command 'cmd', if the command succeed it resets and restarts the watchdog. Without -e, watchdogd tests a stat(/etc,xxx) syscall. See http://ezine.daemonnews.org/200406/watchdog.html This smells more like a bug in watchdog. If that's the case, the crash dumps should point right at it, at which point I'd take it to freebsd-stable or -current, whichever applies to the OS version. Okay, we'll enable dumpdev/dumpdir and see what we get. If the watchdog is a hardware watchdog, you will not get any log or crash dump, just a hard reset. Which watchdog are you using? Regards. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: What is correct way to enable watchdog?
Which watchdog are you using? We are using the default FreeBSD 7.0 watchdog. We've added the line watchdogd_enable=yes to rc.conf to enable it and have modified /etc/rc.d/watchdogd to pass -t 300 to the daemon instead of the default 16. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: What is correct way to enable watchdog?
On Monday 23 February 2009 12:11:38 Peter Steele wrote: We assumed this would give us a watchdog timeout of 300 seconds (5 minutes), meaning a system would not reboot unless it is non-responsive for five minutes. No, meaning, if a system is unresponsive for 300 seconds, action will be taken. watchdogd will not prevent proper reboots, panics or power failures. However, in a recent stress test we had unexplained spontaneous reboots on two systems, with no logs of any kind to indicate why the systems rebooted. Panic, or overheating. Check the dumpdev/dumpdir variables in rc.conf(5). -- Mel Problem with today's modular software: they start with the modules and never get to the software part. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org