RAID failure with READ_DMA status=51 - how to avoid again?

2007-02-28 Thread Oliver Iberien
I would like to RAID my system but am wondering if I am asking for trouble, 
given that I got some kind of read failure error followed by file system 
corruption the first time I did it. Would it be reasonable for me to try 
RAIDing again, and if so, under what conditions? Details are as follows:

I moved my home FreeBSD 6.0 system, which had previously been on a single IDE 
drive, onto two SATA drives (set to 3.0 G) in a RAID-1 array, with hardware 
raid (Nvidia) on the motherboard (ASUS A8N-E). I used dump as instructed in 
the FreeBSD FAQ. This went okay. 

I then installed a third, large (400GB) SATA drive and backed up the system on 
the RAID (minus /proc, /tmp, and so on) to it using rdiff-backup. This seemed 
to go OK.

Then, when I shut down immediately afterwards, I saw this:
Feb 27 08:43:19 bsd kernel: ad8: FAILURE - READ_DMA status=51READY,DSC,ERROR 
error=40UNCORRECTABLE LBA=146193935
Feb 27 08:43:19 bsd kernel: ar0: WARNING - mirror protection lost. RAID1 array 
in DEGRADED mode
Feb 27 08:43:19 bsd kernel: ar0: writing of nVidia MediaShield metadata is NOT 
supported yet 

I rebooted, the message from the bios that the RAID was healthy came up, but 
FreeBSD said the file system was not healthy, and I had to run fsck about 
five times for it to come up clean. The system booted to desktop, crashed 
after about ten seconds, rebooted, and turned up with a dirty filesytem 
again.

I have since dismantled RAID, removed one of the SATA drives, fsck'ed 
repeatedly, and then reinstalled KDE, figuring that that as it only crashed 
when it had finished loading the desktop, that something might be amiss 
there. The system is running again.

All the drives are brand new, as is the cabling. The drives show up in 
messages as SATA150 (is 3.0G not supported in FreeBSD?), although the board 
supports 3.0G transfer rates. There is an errata sheet in the motherboard 
manual with a matrix indicating on which drive, given multiple SATA drives, 
the OS should be installed. It's silent on why this is advised and on the 
subject of the proper order if RAID is involved. Extended offline SMART test 
on the current drive with smartctl completed without error and overall-health 
self-assessment test result: PASSED. Thanks in advance for any advice.

Oliver


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: RAID failure with READ_DMA status=51 - how to avoid again?

2007-02-28 Thread Wood, Russell

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:owner-freebsd-
 [EMAIL PROTECTED] On Behalf Of Oliver Iberien
 Sent: Thursday, 1 March 2007 10:02 AM
 To: freebsd-questions@freebsd.org
 Subject: RAID failure with READ_DMA status=51 - how to avoid again?
 
 I would like to RAID my system but am wondering if I am asking for
 trouble,
 given that I got some kind of read failure error followed by file
system
 corruption the first time I did it. Would it be reasonable for me to
try
 RAIDing again, and if so, under what conditions? Details are as
follows:
 
 I moved my home FreeBSD 6.0 system, which had previously been on a
single
 IDE
 drive, onto two SATA drives (set to 3.0 G) in a RAID-1 array, with
 hardware
 raid (Nvidia) on the motherboard (ASUS A8N-E). I used dump as
instructed
 in
 the FreeBSD FAQ. This went okay.
 
 I then installed a third, large (400GB) SATA drive and backed up the
 system on
 the RAID (minus /proc, /tmp, and so on) to it using rdiff-backup. This
 seemed
 to go OK.
 
 Then, when I shut down immediately afterwards, I saw this:
 Feb 27 08:43:19 bsd kernel: ad8: FAILURE - READ_DMA
 status=51READY,DSC,ERROR
 error=40UNCORRECTABLE LBA=146193935
 Feb 27 08:43:19 bsd kernel: ar0: WARNING - mirror protection lost.
RAID1
 array
 in DEGRADED mode
 Feb 27 08:43:19 bsd kernel: ar0: writing of nVidia MediaShield
metadata is
 NOT
 supported yet
 
 I rebooted, the message from the bios that the RAID was healthy came
up,
 but
 FreeBSD said the file system was not healthy, and I had to run fsck
about
 five times for it to come up clean. The system booted to desktop,
crashed
 after about ten seconds, rebooted, and turned up with a dirty
filesytem
 again.
 
 I have since dismantled RAID, removed one of the SATA drives, fsck'ed
 repeatedly, and then reinstalled KDE, figuring that that as it only
 crashed
 when it had finished loading the desktop, that something might be
amiss
 there. The system is running again.
 
 All the drives are brand new, as is the cabling. The drives show up in
 messages as SATA150 (is 3.0G not supported in FreeBSD?), although
the
 board
 supports 3.0G transfer rates. There is an errata sheet in the
motherboard
 manual with a matrix indicating on which drive, given multiple SATA
 drives,
 the OS should be installed. It's silent on why this is advised and on
the
 subject of the proper order if RAID is involved. Extended offline
SMART
 test
 on the current drive with smartctl completed without error and
overall-
 health
 self-assessment test result: PASSED. Thanks in advance for any advice.
 
 Oliver

I would suggest downloading FreeSBIE, booting from it and running a dd
on your drives to see if it picks up any bad sectors:

dd if=/dev/adN of=/dev/null bs=1m conv=noerror

Regards,
Russell Wood


DISCLAIMER:
Disclaimer.  This e-mail is private and confidential. If you are not the 
intended recipient, please advise us by return e-mail immediately, and delete 
the e-mail and any attachments without using or disclosing the contents in any 
way. The views expressed in this e-mail are those of the author, and do not 
represent those of this company unless this is clearly indicated. You should 
scan this e-mail and any attachments for viruses. This company accepts no 
liability for any direct or indirect damage or loss resulting from the use of 
any attachments to this e-mail.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]