Re: Harddisk failure causes system crash, please help

2007-11-10 Thread David Naylor
On 09/11/2007, Jeremy Chadwick [EMAIL PROTECTED] wrote:
 Okay, so it's probably that area of the disk which has some problem...
It may but I can confirm that FreeBSD is not handling it properly (see below)

 There's a free utility called HDTune which has a sector scanner which
 explicitly looks for bad sectors (Error Scan).  I would *uncheck* the
I got it and it works well.  Thank you.  The first time I used it
there was a corrupt sector in approximately the area ad0e occupied
(where the crashes from reads were from).  I then did a dd
if=/dev/urandom of=/dev/ad0e bs=64k and the error disappeared (from
HDTune).  Something strange did appear though, a bad sector near the
beginning of the drive however I have not seen it since.  I have run
multiple tests since and all were green.

 You might also be able to use that utility to get SMART stats for the
 drive, although smartctl -a /dev/ad0 should suffice too.  The disk
Sorry but I have not used smartctl since as the dd wiped out large
portions of FreeBSD and I have not reinstalled it.

In summary: I have been having problems with FreeBSD when reading from
a certain area of the hard disk (a system crash occurs).  I have been
able to determine that the hard drive is not corrupting (or the
corrupting has stabilized...?).  More importantly I have determined
that other operating systems have not been having problems (Windows
and Linux (openSUSE 10.2 Rescue)) thus I conclude that FreeBSD is in
fact not handling the hardware properly (malfunctioning or buggy
driver? or more likely not with a required quark).

When I first installed FreeBSD I did a dd if=/dev/zero of=/dev/ad0
bs=1M.  I think this may have contributed to the problem.  I read
somewhere that optical drives do not handle 0's and  1's in continues
succession well and this may be applicable to hard drives (comments?)
and that FreeBSD is not handing correctly.

Should I submit a PR  or is this better handled on the mailing list
(if it should be handled at all)?

Thank you for your help.

David

P.S. After further testing the occurrence of the crashes seems to be
less consistent.  I will continue some testing to see if I can
determine a consistent pattern.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Harddisk failure causes system crash, please help

2007-11-09 Thread Bob Bishop

Hi,

On 8 Nov 2007, at 20:40, David Naylor wrote:


[possible disk problem]

I have no idea what is wrong (if the disk has corrupted should the  
kernel

not display error messages?).  Can you please help/advise?


A flaky disk drive (rather than a corrupt filesystem on a good disk)  
will not necessarily talk enough sense for drivers to behave as we'd  
all like.


If you suspect the disk hardware, your first recourse should be to  
the manufacturer's diagnostic tool - they all have one, usually a  
bootable floppy (or CD these days), look on the disk manufacturer's  
website.


SMART is fine if the disk is working and may even help with an  
incipient failure that hasn't done any serious damage yet. Otherwise  
the manufacturer's diagnostic is your best bet.


--
Bob Bishop  +44 (0)118 940 1243
[EMAIL PROTECTED] fax +44 (0)118 940 1295




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Harddisk failure causes system crash, please help

2007-11-08 Thread Jeremy Chadwick
On Thu, Nov 08, 2007 at 10:40:49PM +0200, David Naylor wrote:
 I have been using this laptop for a few months now with FreeBSD without any
 problems with the hard disk however today as I installed editors/vim the
 system crashed (without a core dump or any message).
 
 When ever the system boots (and proceeds to do a fsck on ad0e (/usr)) it
 also crashes without any message.  I have tried the following commands:
 
 # dd if=/dev/ad0 of=/dev/null bs=1M ( System crashes)
 
 # smartctl -C -t short ( Succeeds )
 # smartctl -C -t long ( Failes with a message: ad0: FAILED - SMART timed 
 out)

Sounds like something mechanical inside of the disk is failing, or
possibly the drive firmware is somewhat buggy when it comes to handling
bad blocks.  What brand/model of hard disk is this?  atacontrol output
would suffice.  I'm just curious (personal interest).

 I have no idea what is wrong (if the disk has corrupted should the kernel
 not display error messages?).  Can you please help/advise?

Not necessarily, although I would expect to see a bus timeout of some
kind, but it doesn't surprise me that you don't see one.  If a long
SMART test results in the drive timing out and falling off the bus,
there's a much bigger problem at hand.  There is a possibility that the
system is simply going bad in some way (RAM issues or mainboard that's
broken somehow), but all your problems seem to indicate issues with the
disk.

If I was in your shoes, I would try to get all the data off that disk,
purchase a replacement, install FreeBSD on it, and restore your data.

I'd then take the old/possibly-bad disk and download one of the drive
fitness test utilities from the manufacturer's website.  Run that and
see if anything comes up / if anything bad happens.

Laptop hard disks are sometimes a pain to deal with (some laptop
manufacturers have BIOS tweakery where they refuse to recognise any hard
disk other than ones of a specific brand/model.  I haven't seen this in
recent years, but it's something I've seen in the past), so I wish you
luck.  Laptops -- such a pain.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Harddisk failure causes system crash, please help

2007-11-08 Thread Claus Guttesen
 I have been using this laptop for a few months now with FreeBSD without any
 problems with the hard disk however today as I installed editors/vim the
 system crashed (without a core dump or any message).

 When ever the system boots (and proceeds to do a fsck on ad0e (/usr)) it
 also crashes without any message.  I have tried the following commands:

 # dd if=/dev/ad0 of=/dev/null bs=1M ( System crashes)

 # smartctl -C -t short ( Succeeds )
 # smartctl -C -t long ( Failes with a message: ad0: FAILED - SMART timed
 out)


Corrupt disk?

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Harddisk failure causes system crash, please help

2007-11-08 Thread David Naylor
On 08/11/2007, Jeremy Chadwick [EMAIL PROTECTED] wrote:
 Sounds like something mechanical inside of the disk is failing, or
 possibly the drive firmware is somewhat buggy when it comes to handling
 bad blocks.  What brand/model of hard disk is this?  atacontrol output
 would suffice.  I'm just curious (personal interest).

# atacontrol list
ATA channel 0:
Master:  ad0 TOSHIBA MK4025GAS/KA100A ATA/ATAPI revision 6

 Not necessarily, although I would expect to see a bus timeout of some
 kind, but it doesn't surprise me that you don't see one.  If a long

I remember seeing a timeout of sorts once, it was while doing a dd.  I
have done further dd tests and only the one slice causes this problem:
ad0e

 SMART test results in the drive timing out and falling off the bus,
 there's a much bigger problem at hand.  There is a possibility that the
 system is simply going bad in some way (RAM issues or mainboard that's

I doubt it is RAM since I have been able to compile the kernel in RAM,
but a good suggestion.  Had a RAM problem with another computer,
failure was much more erratic...

 broken somehow), but all your problems seem to indicate issues with the
 disk.

Do you know of any test I can run using Windows (BartPE) that could
possibly diagnose the problem (or at least confirm it is not FreeBSD's
fault for rebooting and just hardware error)?

By the way, the laptop is a Acer TravelMate 2700 (with very buggy USB
controllers).

Thank you for your help.

David
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Harddisk failure causes system crash, please help

2007-11-08 Thread Jeremy Chadwick
On Fri, Nov 09, 2007 at 08:29:52AM +0200, David Naylor wrote:
 I remember seeing a timeout of sorts once, it was while doing a dd.  I
 have done further dd tests and only the one slice causes this problem:
 ad0e

Okay, so it's probably that area of the disk which has some problem...

  broken somehow), but all your problems seem to indicate issues with the
  disk.
 
 Do you know of any test I can run using Windows (BartPE) that could
 possibly diagnose the problem (or at least confirm it is not FreeBSD's
 fault for rebooting and just hardware error)?

There's a free utility called HDTune which has a sector scanner which
explicitly looks for bad sectors (Error Scan).  I would *uncheck* the
Quick Scan box.  If nothing shows up there, I'd check your Event Log to
see if there's any reports of disk/controller issues.

You might also be able to use that utility to get SMART stats for the
drive, although smartctl -a /dev/ad0 should suffice too.  The disk
itself may have been relocating data onto working sectors all this time;
usually SMART will show that (but not always -- depends on how the disk
manufacturer did their firmware).

But keep in mind Windows is one of the most silent OSes I've ever seen
when it comes to disk errors.  A disk can be failing miserably and it'll
never bother to report ATA timeouts or anything else in the event log.
The easiest ones to detect are mechanical failures, since all disk I/O
will stop (why is my machine hanging?!?), and if you're lucky,
you'll hear the drive making scary noises.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]