Re: low-level format before install?
On Tue, Apr 07, 2009 at 05:41:27PM -0400, John Almberg wrote: > Thanks for all the tips. At least I have something to start with. > > The guys in the data center reinstalled FreeBSD (the filesystem was > totally corrupted again), and then ran what they called "SMART test", > which might be smartctl, and said the hard drives look good. > > I am now able to get back in. > > So the system ran fine until I put a load on it with the database > (many transactions a second). This corrupted the file system again. > > So I guess I need to load it enough to produce error messages > (hopefully) but not enough to destroy the file system again. I've had issues with a few hosted servers, and more often than not, it was a bad PSU on the server and/or rack. Assuming that you can't get these folks to run a good hardware diag for you, there are a few things you can do. You can beat up the RAM/cpu with various burn-in programs (I like benchmarks/stream for its simplicity -- you'll need to "make extract", customize, then ,"make install" for your own memory size). You can thrash the disks pretty well with either dd or "badblocks" from sysutils/e2fsprogs, both can be non-destructive. -- Geoff ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: low-level format before install?
John Almberg wrote: > Thanks for all the tips. At least I have something to start with. > > The guys in the data center reinstalled FreeBSD (the filesystem was > totally corrupted again), and then ran what they called "SMART test", > which might be smartctl, and said the hard drives look good. > > I am now able to get back in. > > So the system ran fine until I put a load on it with the database > (many transactions a second). This corrupted the file system again. > > So I guess I need to load it enough to produce error messages > (hopefully) but not enough to destroy the file system again. > > Sounds like fun :-( > > This is an Intel server, not a crummy white box, so hopefully it is > smart enough to monitor its own hardware at least a bit. We'll see. > Just a tidbit or two. If it has an ICHR type South Bridge with what Intel calls "Matrix RAID" there has been reported problems with trying to use the RAID functionality. If you are not using the RAID make sure the data center guys are turning this off in the BIOS. Whenever I see these kinds of reports about data corruption correlating with SMART saying the drives are "good" I think disk controller. It does seem strange if the problem was not present previous to the "power fluctuations". But where hardware damage occurs can be funky. At least with the box I once had that took a direct lightning strike it was interesting to see where the lightening bounced around inside. If this is a 1u pizza box with only one power supply I would suspect the power supply of being damaged from the power problem. If it is a relatively low wattage unit then the damage sustained has created a situation where it doesn't have enough overhead to provide regulated pure DC when under full load. I remember a software company I worked for a few years stuck the old WORM drives in an HP Vectra desktop that only had a 135 watt power supply. You could see the power go all wonky with an oscilloscope as soon as that WORM drive started up, but the box worked well up until this point. At any rate, this all sounds like hardware to me. If it wasn't doing any of this before the so-called "power event" then I believe there has been hardware damage. Unless you are co-locating your own hardware it is the responsibility of the data center to provide you with functional hardware. After the first go around and the same problem resurfacing they should have yanked the box and just replaced it. Put a good one in service and troubleshoot the bad one off line. If they can't hold up their end of the deal you need to be looking somewhere else. -Mike ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: low-level format before install?
Thanks for all the tips. At least I have something to start with. The guys in the data center reinstalled FreeBSD (the filesystem was totally corrupted again), and then ran what they called "SMART test", which might be smartctl, and said the hard drives look good. I am now able to get back in. So the system ran fine until I put a load on it with the database (many transactions a second). This corrupted the file system again. So I guess I need to load it enough to produce error messages (hopefully) but not enough to destroy the file system again. Sounds like fun :-( This is an Intel server, not a crummy white box, so hopefully it is smart enough to monitor its own hardware at least a bit. We'll see. -- John ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: low-level format before install?
On Tue, Apr 07, 2009 at 03:44:20PM -0400, John Almberg wrote: > Apparently, power was fluctuating drastically before they decided to > cut power, so a hardware problem is a definite possibility. A PSU > failure would not surprise me in the circumstances. > > Assuming I can ever ssh in again, what log would hardware failures be > reported to? Often hardware problems can lock up or reboot the machine without any warning in the logs. :-( It is next to impossible for PC class hardware to catch hardware failures. But sysutils/healthd or sysutils/mbmon might help in that they monitor vital motherboard parameters, which can then be logged. Some systems log thermal events through the ACPI system or via the coretemp driver, in which case devd(8) should get them. See devd.conf(5) in a recent 7-STABLE, this manpage was recently enhanced by yours truly. Big programs like compilers randomly dying with a signal 11 (SIGSEGV, segmentation violation) can be a sign of memory problems. If someone has access to the machine, have them make sure there are no loose connectors and that any expansion cards are properly seated. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgpHStgkpenfi.pgp Description: PGP signature
Re: low-level format before install?
On Apr 7, 2009, at 12:44 PM, John Almberg wrote: That sounds like either a hardware problem (ie CPU overheating or marginal PSU failing under production load), or less likely, some kind of software misconfiguration. System logs would be useful to see whether any signs of trouble are being mentioned. Apparently, power was fluctuating drastically before they decided to cut power, so a hardware problem is a definite possibility. A PSU failure would not surprise me in the circumstances. Assuming I can ever ssh in again, what log would hardware failures be reported to? Start with /var/log/messages and output of "dmesg" command. Doing an "ls -ltr /var/log" and looking at others which have changed recently would also be advisable... Regards, -- -Chuck ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: low-level format before install?
On Apr 7, 2009, at 12:15 PM, John Almberg wrote: Well, I've got real problems with that database server that lost power over the weekend. We reloaded FreeBSD from scratch and then reinstalled mysql, and pf. I loaded up my database and switched over all my customer's websites. The database server ran fine for about 2 minutes, and then died. At the moment, I can't even ssh into the machine, although they can get into it using a keyboard/monitor at the data center. In other words, sshd is not working. That sounds like either a hardware problem (ie CPU overheating or marginal PSU failing under production load), or less likely, some kind of software misconfiguration. System logs would be useful to see whether any signs of trouble are being mentioned. I am now wondering what kind of format the FreeBSD install process does by default, and if it is possible to do a low level format, first, to block out any bad sectors (not sure if this is the right terminology). I'm starting to get real depressed about this machine... You would think a top-tier data center could keep the power on... SCSI drives support a standard mechanism called "format unit" to do a low-level format; ATA and SATA drives do not have a standard mechanism, but you might be able to find a utility from the manufacturer which can do such a thing. It would not be expected that doing such would be helpful, as any modern drive has automatic mechanisms to replace bad sectors with spares transparently, at least until the drive has gotten to such a condition that it's out of spare sectors (in which case the entire drive is likely to be toast soon, anyway, and should be replaced ASAP). However, if you do suspect drive problems, try installing and running smartctl from /usr/ports/sysutils/smartmontools, and do a self-test or two. Regards, -- -Chuck ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: low-level format before install?
On Tue, Apr 07, 2009 at 03:15:59PM -0400, John Almberg wrote: > Well, I've got real problems with that database server that lost > power over the weekend. We reloaded FreeBSD from scratch and then > reinstalled mysql, and pf. I loaded up my database and switched over > all my customer's websites. The database server ran fine for about 2 > minutes, and then died. At the moment, I can't even ssh into the > machine, although they can get into it using a keyboard/monitor at > the data center. In other words, sshd is not working. > > I am now wondering what kind of format the FreeBSD install process > does by default, and if it is possible to do a low level format, > first, to block out any bad sectors (not sure if this is the right > terminology). What you could do is run a shell from the install CD, then fill the disk with zeros using 'dd if=/dev/zero of=/dev/ bs=2m'. As I understand it, modern hard disks cannot be low-level formatted by the user. It is done at the factory. And bad blocks are re-allocated by the built-in controller without user intervention. In fact, you'll only see re-allocated blocks in the smartctl -a output (as Reallocated_Sector_Ct) when the drive has exhausted its spare sectors. In which case you'd better replace it, because it is failing. > I'm starting to get real depressed about this machine... You would > think a top-tier data center could keep the power on... Are you sure that the hardware isn't crapping out on you? At least run smartctl -a on your disks to see if they failed any self test, and a monitoring program like mbmon to check on temperatures and voltage levels. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgp0KRVDmcbcc.pgp Description: PGP signature
Re: low-level format before install?
On Apr 7, 2009, at 3:37 PM, Chuck Swiger wrote: On Apr 7, 2009, at 12:15 PM, John Almberg wrote: Well, I've got real problems with that database server that lost power over the weekend. We reloaded FreeBSD from scratch and then reinstalled mysql, and pf. I loaded up my database and switched over all my customer's websites. The database server ran fine for about 2 minutes, and then died. At the moment, I can't even ssh into the machine, although they can get into it using a keyboard/ monitor at the data center. In other words, sshd is not working. That sounds like either a hardware problem (ie CPU overheating or marginal PSU failing under production load), or less likely, some kind of software misconfiguration. System logs would be useful to see whether any signs of trouble are being mentioned. Apparently, power was fluctuating drastically before they decided to cut power, so a hardware problem is a definite possibility. A PSU failure would not surprise me in the circumstances. Assuming I can ever ssh in again, what log would hardware failures be reported to? -- John ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
low-level format before install?
Well, I've got real problems with that database server that lost power over the weekend. We reloaded FreeBSD from scratch and then reinstalled mysql, and pf. I loaded up my database and switched over all my customer's websites. The database server ran fine for about 2 minutes, and then died. At the moment, I can't even ssh into the machine, although they can get into it using a keyboard/monitor at the data center. In other words, sshd is not working. I am now wondering what kind of format the FreeBSD install process does by default, and if it is possible to do a low level format, first, to block out any bad sectors (not sure if this is the right terminology). I'm starting to get real depressed about this machine... You would think a top-tier data center could keep the power on... -- John ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"