Re: Harddisk failure causes system crash, please help
On 09/11/2007, Jeremy Chadwick [EMAIL PROTECTED] wrote: Okay, so it's probably that area of the disk which has some problem... It may but I can confirm that FreeBSD is not handling it properly (see below) There's a free utility called HDTune which has a sector scanner which explicitly looks for bad sectors (Error Scan). I would *uncheck* the I got it and it works well. Thank you. The first time I used it there was a corrupt sector in approximately the area ad0e occupied (where the crashes from reads were from). I then did a dd if=/dev/urandom of=/dev/ad0e bs=64k and the error disappeared (from HDTune). Something strange did appear though, a bad sector near the beginning of the drive however I have not seen it since. I have run multiple tests since and all were green. You might also be able to use that utility to get SMART stats for the drive, although smartctl -a /dev/ad0 should suffice too. The disk Sorry but I have not used smartctl since as the dd wiped out large portions of FreeBSD and I have not reinstalled it. In summary: I have been having problems with FreeBSD when reading from a certain area of the hard disk (a system crash occurs). I have been able to determine that the hard drive is not corrupting (or the corrupting has stabilized...?). More importantly I have determined that other operating systems have not been having problems (Windows and Linux (openSUSE 10.2 Rescue)) thus I conclude that FreeBSD is in fact not handling the hardware properly (malfunctioning or buggy driver? or more likely not with a required quark). When I first installed FreeBSD I did a dd if=/dev/zero of=/dev/ad0 bs=1M. I think this may have contributed to the problem. I read somewhere that optical drives do not handle 0's and 1's in continues succession well and this may be applicable to hard drives (comments?) and that FreeBSD is not handing correctly. Should I submit a PR or is this better handled on the mailing list (if it should be handled at all)? Thank you for your help. David P.S. After further testing the occurrence of the crashes seems to be less consistent. I will continue some testing to see if I can determine a consistent pattern. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Harddisk failure causes system crash, please help
Hi, On 8 Nov 2007, at 20:40, David Naylor wrote: [possible disk problem] I have no idea what is wrong (if the disk has corrupted should the kernel not display error messages?). Can you please help/advise? A flaky disk drive (rather than a corrupt filesystem on a good disk) will not necessarily talk enough sense for drivers to behave as we'd all like. If you suspect the disk hardware, your first recourse should be to the manufacturer's diagnostic tool - they all have one, usually a bootable floppy (or CD these days), look on the disk manufacturer's website. SMART is fine if the disk is working and may even help with an incipient failure that hasn't done any serious damage yet. Otherwise the manufacturer's diagnostic is your best bet. -- Bob Bishop +44 (0)118 940 1243 [EMAIL PROTECTED] fax +44 (0)118 940 1295 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Harddisk failure causes system crash, please help
On Thu, Nov 08, 2007 at 10:40:49PM +0200, David Naylor wrote: I have been using this laptop for a few months now with FreeBSD without any problems with the hard disk however today as I installed editors/vim the system crashed (without a core dump or any message). When ever the system boots (and proceeds to do a fsck on ad0e (/usr)) it also crashes without any message. I have tried the following commands: # dd if=/dev/ad0 of=/dev/null bs=1M ( System crashes) # smartctl -C -t short ( Succeeds ) # smartctl -C -t long ( Failes with a message: ad0: FAILED - SMART timed out) Sounds like something mechanical inside of the disk is failing, or possibly the drive firmware is somewhat buggy when it comes to handling bad blocks. What brand/model of hard disk is this? atacontrol output would suffice. I'm just curious (personal interest). I have no idea what is wrong (if the disk has corrupted should the kernel not display error messages?). Can you please help/advise? Not necessarily, although I would expect to see a bus timeout of some kind, but it doesn't surprise me that you don't see one. If a long SMART test results in the drive timing out and falling off the bus, there's a much bigger problem at hand. There is a possibility that the system is simply going bad in some way (RAM issues or mainboard that's broken somehow), but all your problems seem to indicate issues with the disk. If I was in your shoes, I would try to get all the data off that disk, purchase a replacement, install FreeBSD on it, and restore your data. I'd then take the old/possibly-bad disk and download one of the drive fitness test utilities from the manufacturer's website. Run that and see if anything comes up / if anything bad happens. Laptop hard disks are sometimes a pain to deal with (some laptop manufacturers have BIOS tweakery where they refuse to recognise any hard disk other than ones of a specific brand/model. I haven't seen this in recent years, but it's something I've seen in the past), so I wish you luck. Laptops -- such a pain. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Harddisk failure causes system crash, please help
I have been using this laptop for a few months now with FreeBSD without any problems with the hard disk however today as I installed editors/vim the system crashed (without a core dump or any message). When ever the system boots (and proceeds to do a fsck on ad0e (/usr)) it also crashes without any message. I have tried the following commands: # dd if=/dev/ad0 of=/dev/null bs=1M ( System crashes) # smartctl -C -t short ( Succeeds ) # smartctl -C -t long ( Failes with a message: ad0: FAILED - SMART timed out) Corrupt disk? -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Harddisk failure causes system crash, please help
Hi, I have been using this laptop for a few months now with FreeBSD without any problems with the hard disk however today as I installed editors/vim the system crashed (without a core dump or any message). When ever the system boots (and proceeds to do a fsck on ad0e (/usr)) it also crashes without any message. I have tried the following commands: # dd if=/dev/ad0 of=/dev/null bs=1M ( System crashes) # smartctl -C -t short ( Succeeds ) # smartctl -C -t long ( Failes with a message: ad0: FAILED - SMART timed out) If I force the mounting of /usr without fsck then the system does boot (but I am not sure how stable it is). I have updated the kernel to cvs RELENG_7 from a few hours ago (and it compiled without a problem after force mounting (and using tmpfs as obj directory, did not try any other way) I have no idea what is wrong (if the disk has corrupted should the kernel not display error messages?). Can you please help/advise? Thank you in advance. David ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Harddisk failure causes system crash, please help
On 08/11/2007, Jeremy Chadwick [EMAIL PROTECTED] wrote: Sounds like something mechanical inside of the disk is failing, or possibly the drive firmware is somewhat buggy when it comes to handling bad blocks. What brand/model of hard disk is this? atacontrol output would suffice. I'm just curious (personal interest). # atacontrol list ATA channel 0: Master: ad0 TOSHIBA MK4025GAS/KA100A ATA/ATAPI revision 6 Not necessarily, although I would expect to see a bus timeout of some kind, but it doesn't surprise me that you don't see one. If a long I remember seeing a timeout of sorts once, it was while doing a dd. I have done further dd tests and only the one slice causes this problem: ad0e SMART test results in the drive timing out and falling off the bus, there's a much bigger problem at hand. There is a possibility that the system is simply going bad in some way (RAM issues or mainboard that's I doubt it is RAM since I have been able to compile the kernel in RAM, but a good suggestion. Had a RAM problem with another computer, failure was much more erratic... broken somehow), but all your problems seem to indicate issues with the disk. Do you know of any test I can run using Windows (BartPE) that could possibly diagnose the problem (or at least confirm it is not FreeBSD's fault for rebooting and just hardware error)? By the way, the laptop is a Acer TravelMate 2700 (with very buggy USB controllers). Thank you for your help. David ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Harddisk failure causes system crash, please help
On Fri, Nov 09, 2007 at 08:29:52AM +0200, David Naylor wrote: I remember seeing a timeout of sorts once, it was while doing a dd. I have done further dd tests and only the one slice causes this problem: ad0e Okay, so it's probably that area of the disk which has some problem... broken somehow), but all your problems seem to indicate issues with the disk. Do you know of any test I can run using Windows (BartPE) that could possibly diagnose the problem (or at least confirm it is not FreeBSD's fault for rebooting and just hardware error)? There's a free utility called HDTune which has a sector scanner which explicitly looks for bad sectors (Error Scan). I would *uncheck* the Quick Scan box. If nothing shows up there, I'd check your Event Log to see if there's any reports of disk/controller issues. You might also be able to use that utility to get SMART stats for the drive, although smartctl -a /dev/ad0 should suffice too. The disk itself may have been relocating data onto working sectors all this time; usually SMART will show that (but not always -- depends on how the disk manufacturer did their firmware). But keep in mind Windows is one of the most silent OSes I've ever seen when it comes to disk errors. A disk can be failing miserably and it'll never bother to report ATA timeouts or anything else in the event log. The easiest ones to detect are mechanical failures, since all disk I/O will stop (why is my machine hanging?!?), and if you're lucky, you'll hear the drive making scary noises. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]