Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-27 Thread Matthew Lear
On Sun, 2010-06-27 at 09:36 +0100, Matthew Seaman wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 27/06/2010 24:04:48, Matthew Lear wrote: > > Incidentally, is there a way to easily migrate from a atacontrol created > > array to a gmirror created array? I'm running FreeBSD 8.0 on a

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-27 Thread Matthew Seaman
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 27/06/2010 24:04:48, Matthew Lear wrote: > Incidentally, is there a way to easily migrate from a atacontrol created > array to a gmirror created array? I'm running FreeBSD 8.0 on another > machine with a gmirror created RAID1 array with no problem w

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-26 Thread Matthew Lear
On Sat, 2010-06-26 at 10:12 -0700, Jeremy Chadwick wrote: > On Sat, Jun 26, 2010 at 04:57:48PM +0100, Matthew Lear wrote: > > On Fri, 2010-06-25 at 00:16 -0700, Jeremy Chadwick wrote: > > > > > > All in all, replacing a drive is a completely reasonable action when > > > there's evidence confirming

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-26 Thread Jeremy Chadwick
On Sat, Jun 26, 2010 at 04:57:48PM +0100, Matthew Lear wrote: > On Fri, 2010-06-25 at 00:16 -0700, Jeremy Chadwick wrote: > > > > All in all, replacing a drive is a completely reasonable action when > > there's evidence confirming the need for its replacement. I don't like > > replacing hardware

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-26 Thread Matthew Lear
On Fri, 2010-06-25 at 00:16 -0700, Jeremy Chadwick wrote: > > All in all, replacing a drive is a completely reasonable action when > there's evidence confirming the need for its replacement. I don't like > replacing hardware when there's no indication replacing it will > necessarily fix the probl

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-25 Thread Jeremy Chadwick
On Thu, Jun 24, 2010 at 05:22:41PM -0500, Adam Vande More wrote: > Haven't followed the entire thread, but wanted to point out something > important to remember. SMART is not a reliable indicator of failure. > It's certainly better than listening to it but it picks up less than > 1/2 of drive failu

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-24 Thread Adam Vande More
Haven't followed the entire thread, but wanted to point out something important to remember. SMART is not a reliable indicator of failure. It's certainly better than listening to it but it picks up less than 1/2 of drive failures. Google released a study of their disks in data centers a few years a

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-24 Thread Matthew Lear
On Thu, 2010-06-24 at 11:15 -0700, Jeremy Chadwick wrote: > On Thu, Jun 24, 2010 at 06:52:14PM +0100, Matthew Lear wrote: > > On Tue, 2010-06-22 at 20:04 +0100, Bob Bishop wrote: > > > Hi, > > > > > > On 22 Jun 2010, at 08:45, Jeremy Chadwick wrote: > > > > > > > On Mon, Jun 21, 2010 at 10:33:12P

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-24 Thread Jeremy Chadwick
On Thu, Jun 24, 2010 at 06:52:14PM +0100, Matthew Lear wrote: > On Tue, 2010-06-22 at 20:04 +0100, Bob Bishop wrote: > > Hi, > > > > On 22 Jun 2010, at 08:45, Jeremy Chadwick wrote: > > > > > On Mon, Jun 21, 2010 at 10:33:12PM +0100, Matthew Lear wrote: > > >> [tale of woe elided] > > > > > > I

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-24 Thread Matthew Lear
On Tue, 2010-06-22 at 20:04 +0100, Bob Bishop wrote: > Hi, > > On 22 Jun 2010, at 08:45, Jeremy Chadwick wrote: > > > On Mon, Jun 21, 2010 at 10:33:12PM +0100, Matthew Lear wrote: > >> [tale of woe elided] > > > > I don't really have any other thoughts on the matter, sadly. > > [helpful suggesti

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-22 Thread Bob Bishop
Hi, On 22 Jun 2010, at 08:45, Jeremy Chadwick wrote: > On Mon, Jun 21, 2010 at 10:33:12PM +0100, Matthew Lear wrote: >> [tale of woe elided] > > I don't really have any other thoughts on the matter, sadly. > [helpful suggestions elided] > > Anyone else have ideas/recommendations? The disks sur

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-22 Thread Jeremy Chadwick
On Mon, Jun 21, 2010 at 10:33:12PM +0100, Matthew Lear wrote: > Hello Jeremy. I just wondered if you had any further thoughts on the > info below. Two new disks arrived over the weekend and I'm still unsure > if I'm best to replace ad0 or not... > Much appreciated indeed. > -- Matt > > On Fri, 20

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-21 Thread Matthew Lear
Hello Jeremy. I just wondered if you had any further thoughts on the info below. Two new disks arrived over the weekend and I'm still unsure if I'm best to replace ad0 or not... Much appreciated indeed. -- Matt On Fri, 2010-06-18 at 20:28 +0100, Matthew Lear wrote: > On Fri, 2010-06-18 at 10:42 -

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-19 Thread Andriy Gapon
on 18/06/2010 20:42 Jeremy Chadwick said the following: > http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting > > I've always read IDNF to mean "OS requested access (read or write) to an > LBA which is out of bounds", where "out of bounds" means "not between 0 > and ". How exact

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-18 Thread Matthew Lear
On Fri, 2010-06-18 at 10:42 -0700, Jeremy Chadwick wrote: > On Fri, Jun 18, 2010 at 04:47:11PM +0100, Matthew Lear wrote: > > Hello Jeremy, > > Thanks very much for the feedback. > > > > [snip] > > > Could you please provide the full output from "smartctl -a /dev/ad0" > > > here? Your drive may b

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-18 Thread Jeremy Chadwick
On Fri, Jun 18, 2010 at 04:47:11PM +0100, Matthew Lear wrote: > Hello Jeremy, > Thanks very much for the feedback. > > [snip] > > Could you please provide the full output from "smartctl -a /dev/ad0" > > here? Your drive may be completely fine and you may not have to swap it > > at all; hard to sa

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-18 Thread Matthew Lear
Hello Jeremy, Thanks very much for the feedback. [snip] > Could you please provide the full output from "smartctl -a /dev/ad0" > here? Your drive may be completely fine and you may not have to swap it > at all; hard to say. Sure. See below: smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 7.2-RELEASE-

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-18 Thread Alexander Motin
Jeremy Chadwick wrote: > On Fri, Jun 18, 2010 at 01:36:53PM +0200, Miroslav Lachman wrote: >> Jeremy Chadwick wrote: >>> On Fri, Jun 18, 2010 at 08:08:24AM +0100, Matthew Lear wrote: >> [...] >> The drives in the RAID exist on two seperate ATA channels: [r...@meshuga /home/matt]# atacontr

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-18 Thread Jeremy Chadwick
On Fri, Jun 18, 2010 at 01:36:53PM +0200, Miroslav Lachman wrote: > Jeremy Chadwick wrote: > >On Fri, Jun 18, 2010 at 08:08:24AM +0100, Matthew Lear wrote: > > [...] > > >>The drives in the RAID exist on two seperate ATA channels: > >>[r...@meshuga /home/matt]# atacontrol list > >>ATA channel 0:

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-18 Thread Miroslav Lachman
Jeremy Chadwick wrote: On Fri, Jun 18, 2010 at 08:08:24AM +0100, Matthew Lear wrote: [...] The drives in the RAID exist on two seperate ATA channels: [r...@meshuga /home/matt]# atacontrol list ATA channel 0: Master: ad0 SATA revision 2.x Slave: ad1 SATA revision 1.x ATA channel

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-18 Thread Jeremy Chadwick
On Fri, Jun 18, 2010 at 08:08:24AM +0100, Matthew Lear wrote: > Hi there, > > I'm running 7.2-RELEASE-p4 on an i386 HP server (ML G5) in RAID1 > configuration. Very recently, I've seen IO errors such as: > > ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=20472527 > > reported and the RAID m

Re: 7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-18 Thread Pieter de Boer
Hi Matthew, I'm running 7.2-RELEASE-p4 on an i386 HP server (ML G5) in RAID1 configuration. Very recently, I've seen IO errors such as: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=20472527 reported and the RAID mirror is now offline. ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left)

7.2-RELEASE-p4, IO errors & RAID1 failure

2010-06-18 Thread Matthew Lear
Hi there, I'm running 7.2-RELEASE-p4 on an i386 HP server (ML G5) in RAID1 configuration. Very recently, I've seen IO errors such as: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=20472527 reported and the RAID mirror is now offline. ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=