Re: Stability Issues on 5.4-RELEASE Box
On Wed, Feb 28, 2007 at 05:39:48PM -0500, [EMAIL PROTECTED] wrote: I'm not running your application mix, but I've never seen random reboots unless there were hardware issues. With 5.X these included having hyperthreading turned on, which I know caused problems with my dual XEON system. Hmmm...two answers so far, two people saying hyperthreading can be an issue. I'll definitely have that turned off ASAP. HTT isn't expected to cause problems at least on modern versions of FreeBSD (I dont remember if old versions like 5.4 have a bug), but it could be if you are running older hardware with broken BIOS support. * Issues with files that are not found on startup sometimes, but are other times. Prime example: the Zope CMS system that's been installed failed to find libmysqlclient.so after a planned soft reboot, but found it with no trouble on a subsequent boot a few minutes later, with no config changes in between. Haven't seen that; are there any messages indicating you're having filesystem problems? Thanks for asking; I see some new nasties in /var/log/messages: Feb 27 09:05:37 www fsck: /dev/ad4s1f: PARTIALLY TRUNCATED INODE I=9397392 Feb 27 09:05:37 www fsck: /dev/ad4s1f: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY. You definitely need to drop to single-user mode and fsck -f: filesystem corruption can cause many problems. * Given my dmesg below, do you see any specific problems? The interrupt storm on uhci1+ is not a good thing. Any thoughts on how to fix it? You can disable USB support if you do not need it, otherwise a fix will probably involve an upgrade to a newer version as a first step. Kris pgpBT0Fmvzg3S.pgp Description: PGP signature
Stability Issues on 5.4-RELEASE Box
Hello All, I've recently fallen into the task of administering a FreeBSD 5.4-RELEASE box that acts as the web server for a small non-profit that I volunteer for. Unfortunately, the system has been having some extremely vexing stability issues over the last month or so, which even my 6+ years of experience as an OpenBSD admin have not helped me track down. First things first, let me say explicitly that I'm not trying to say FreeBSD sucks, it's not stable or anything like that. It's a fine OS, and I'm sure that it's either faulty hardware or a misconfiguration of some sort causing these problems. :-) That said, here are some of the symptoms the box has been experiencing: * Occasional random reboots. I've only personally witnessed one, and they don't happen often, but any time a *NIX box just reboots for no apparent reason (there was no indication of a problem in any of the logs, at least that I could see), something really bad is going on. * Random extreme slowness when logging in via SSH, with the time to get a shell ranging from a second or two all the way up to 80 seconds. The box isn't busy enough that it's just slow due to load (especially since, once you're in, things fly), and it's not just a reverse DNS issue like I've seen on OpenBSD (this occurs even when logging in from locations listed in /etc/hosts that resolve properly out of that file). Until I upgraded to the current version of OpenSSL/OpenSSH, the box would occasionally just become unresponsive altogether over SSH, not allowing logins for 15+ minutes at a time. * Issues with files that are not found on startup sometimes, but are other times. Prime example: the Zope CMS system that's been installed failed to find libmysqlclient.so after a planned soft reboot, but found it with no trouble on a subsequent boot a few minutes later, with no config changes in between. * A warning in /var/log/messages that the root filesystem was full, when it was at 60% capacity (and something like 2% inode capacity); the problem has yet to repeat, though no files have been cleared off of that filesystem. * Random crashes of the Zope/Plone system that's running the main part of the web site. While I realize that, in and of itself, this means nothing about the stability of the underlying OS, in the context of all of the other things going on (as well as the fact that the Zope list has been unable to help figure out why it's crashing), it seems like it might be further evidence of a larger problem. Thus far, besides simply scanning log files, constantly watching top and ps, etc., I've not been able to do much with the box. As I said, I upgraded OpenSSL/OpenSSH to current versions, and I installed pf as the firewall (there was none before I arrived...don't even get me started on that). This weekend the guy who was the previous admin will be running a Memtest for me and disabling hyperthreading (which there's no performance justification for, and which has caused me stability issues at least on Linux in the past), since the server is in Oregon and I'm in the DC area. That's about the extent of what I've been able to do to date, since this is a production box. What I'd like to know from you guys is: * Am I justified in suspecting hyperthreading as a potential cause of instability? * Does 5.4-RELEASE have any known bugs that might cause stability issues like the ones I've described here? More importantly, would an upgrade to 6.2-RELEASE be worthwhile (as is my instinct), in terms of being generally more stable and/or having better hardware support? Would such an upgrade be possible/relatively painless to perform without being physically at a console, as has been the case with OpenBSD over the years? * Given my dmesg below, do you see any specific problems? * Do you have any other suggestions for debugging this problem? Thanks in advance for any help you can provide. :-) Alex Kirk dmesg: Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.4-RELEASE #0: Sun May 8 10:21:06 UTC 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC ACPI APIC Table: INTEL D945GTP Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 3.20GHz (3200.01-MHz 686-class CPU) Origin = GenuineIntel Id = 0xf43 Stepping = 3 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Hyperthreading: 2 logical CPUs real memory = 2137509888 (2038 MB) avail memory = 2086207488 (1989 MB) ioapic0: Changing APIC ID to 2 ioapic0 Version 2.0 irqs 0-23 on motherboard npx0: math processor on motherboard npx0: INT 16 interface acpi0: INTEL D945GTP on motherboard acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at
Re: Stability Issues on 5.4-RELEASE Box
[EMAIL PROTECTED] wrote: Hello All, I've recently fallen into the task of administering a FreeBSD 5.4-RELEASE box that acts as the web server for a small non-profit that I volunteer for. Unfortunately, the system has been having some extremely vexing stability issues over the last month or so, which even my 6+ years of experience as an OpenBSD admin have not helped me track down. First things first, let me say explicitly that I'm not trying to say FreeBSD sucks, it's not stable or anything like that. It's a fine OS, and I'm sure that it's either faulty hardware or a misconfiguration of some sort causing these problems. :-) That said, here are some of the symptoms the box has been experiencing: * Occasional random reboots. I've only personally witnessed one, and they don't happen often, but any time a *NIX box just reboots for no apparent reason (there was no indication of a problem in any of the logs, at least that I could see), something really bad is going on. * Random extreme slowness when logging in via SSH, with the time to get a shell ranging from a second or two all the way up to 80 seconds. The box isn't busy enough that it's just slow due to load (especially since, once you're in, things fly), and it's not just a reverse DNS issue like I've seen on OpenBSD (this occurs even when logging in from locations listed in /etc/hosts that resolve properly out of that file). Until I upgraded to the current version of OpenSSL/OpenSSH, the box would occasionally just become unresponsive altogether over SSH, not allowing logins for 15+ minutes at a time. * Issues with files that are not found on startup sometimes, but are other times. Prime example: the Zope CMS system that's been installed failed to find libmysqlclient.so after a planned soft reboot, but found it with no trouble on a subsequent boot a few minutes later, with no config changes in between. * A warning in /var/log/messages that the root filesystem was full, when it was at 60% capacity (and something like 2% inode capacity); the problem has yet to repeat, though no files have been cleared off of that filesystem. * Random crashes of the Zope/Plone system that's running the main part of the web site. While I realize that, in and of itself, this means nothing about the stability of the underlying OS, in the context of all of the other things going on (as well as the fact that the Zope list has been unable to help figure out why it's crashing), it seems like it might be further evidence of a larger problem. Thus far, besides simply scanning log files, constantly watching top and ps, etc., I've not been able to do much with the box. As I said, I upgraded OpenSSL/OpenSSH to current versions, and I installed pf as the firewall (there was none before I arrived...don't even get me started on that). This weekend the guy who was the previous admin will be running a Memtest for me and disabling hyperthreading (which there's no performance justification for, and which has caused me stability issues at least on Linux in the past), since the server is in Oregon and I'm in the DC area. That's about the extent of what I've been able to do to date, since this is a production box. What I'd like to know from you guys is: * Am I justified in suspecting hyperthreading as a potential cause of instability? * Does 5.4-RELEASE have any known bugs that might cause stability issues like the ones I've described here? More importantly, would an upgrade to 6.2-RELEASE be worthwhile (as is my instinct), in terms of being generally more stable and/or having better hardware support? Would such an upgrade be possible/relatively painless to perform without being physically at a console, as has been the case with OpenBSD over the years? * Given my dmesg below, do you see any specific problems? * Do you have any other suggestions for debugging this problem? Thanks in advance for any help you can provide. :-) Alex Kirk I would certainly think hardware is the place to look. Just so you know, we still run a server on FBSD 4.8, and it runs very well. We have 4.8, 4.11, 5.2.1, 5.4, 6.1, and 6.2. Oh, and a couple Linux, NetBSD, and Solaris boxen too. I prefer not to chase versions on high load production equipment, certainly not as a problem resolution strategy. For the record, I have never had an blind upgrade fix an unidentified problem, and if it did I would be very worried. I would guess memory, at least that is where I would look first. I would also wonder what environment the server runs in, heat is a killer, so is vibration. Loose racks and humming floors can and will cause connections to slip. I have fixed servers that ran for months and suddenly showed odd behavior simply by powering down and removing all cards/ram/cables, then reattaching everything. Mysterious failures, 3000 miles to the console, I don't envy you ;^) DAve -- Three years now I've asked Google why they don't have a
Re: Stability Issues on 5.4-RELEASE Box
I'm not running your application mix, but I've never seen random reboots unless there were hardware issues. With 5.X these included having hyperthreading turned on, which I know caused problems with my dual XEON system. Hmmm...two answers so far, two people saying hyperthreading can be an issue. I'll definitely have that turned off ASAP. * Issues with files that are not found on startup sometimes, but are other times. Prime example: the Zope CMS system that's been installed failed to find libmysqlclient.so after a planned soft reboot, but found it with no trouble on a subsequent boot a few minutes later, with no config changes in between. Haven't seen that; are there any messages indicating you're having filesystem problems? Thanks for asking; I see some new nasties in /var/log/messages: Feb 27 09:05:37 www fsck: /dev/ad4s1f: PARTIALLY TRUNCATED INODE I=9397392 Feb 27 09:05:37 www fsck: /dev/ad4s1f: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY. Thus far, besides simply scanning log files, constantly watching top and ps, etc., I've not been able to do much with the box. As I said, I upgraded OpenSSL/OpenSSH to current versions, and I installed pf as the firewall (there was none before I arrived...don't even get me started on that). This weekend the guy who was the previous admin will be running a Memtest for me and disabling hyperthreading (which there's no performance justification for, and which has caused me stability issues at least on Linux in the past), since the server is in Oregon and I'm in the DC area. That's about the extent of what I've been able to do to date, since this is a production box. How did you upgrade OpenSSL/SSH - cvsup + buildworld, etc., from the ports, or some other way? I've always did this using cvsup since OpenSSL/SSH is now built into the OS From source on openssl.org and openssh.org. I'm not yet familiar with all of the cool helper things that FreeBSD has such as cvsup, and I was conerned about getting those two specifically fixed fast when I first started this task (as the version on there was old enough that there were known remote exploits). * Does 5.4-RELEASE have any known bugs that might cause stability issues like the ones I've described here? More importantly, would an upgrade to 6.2-RELEASE be worthwhile (as is my instinct), in terms of being generally more stable and/or having better hardware support? Would such an upgrade be possible/relatively painless to perform without being physically at a console, as has been the case with OpenBSD over the years? I am not running your mix, but I have a 4.11 box, a 5.5-STABLE box, and a 6.2-STABLE box, and haven't seen anything like this. the 4.11 box looks a lot like yours - Supermicro P4DC6 with dual Xeons, RAID1 with the onboard Adaptec controller + the add-in RAID board, RAID5 using vinum with an Adaptec (39160?) board. 5.X plus hyperthreading was definitely a problem with that system. I have another similar box which I'm testing with 6.2-STABLE and haven't seen any problems yet. Again, since the two answers I have so far say essentially the same thing here, it looks like my concerns about hardware are valid. * Given my dmesg below, do you see any specific problems? The interrupt storm on uhci1+ is not a good thing. Any thoughts on how to fix it? Thanks, Alex Kirk ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]