Re: ZFS zpool replace problems

2010-01-26 Thread Jeremy Chadwick
I'm removing the In-Reply-To mail headers for this thread, as you've now
hijacked it for a different purpose.  Please don't do this; start a new
thread altogether.  :-)

On Tue, Jan 26, 2010 at 02:57:20PM +0100, Gerrit Kühn wrote:
 I am still busy replacing RE2-disks with updated drives. I came across a
 very strange thing with zfs. Actually I had the following pool layout:
 
 mclane# zpool status
   pool: tank
  state: ONLINE
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 tankONLINE   0 0 0
   raidz1ONLINE   0 0 0
 ad8 ONLINE   0 0 0
 ad10ONLINE   0 0 0
 ad12ONLINE   0 0 0
 spares
   ad14  AVAIL   
 
 errors: No known data errors
 
 All disks still have the firmware bug, so I want to replace them with
 disks that I already fixed. I put in a updated drive as ad18 and
 wanted to replace ad12 to get the drive with the broken firmware out:
 
 mclane# zpool replace tank /dev/ad12 /dev/ad18 
 mclane# zpool status
   pool: tank
  state: ONLINE
 status: One or more devices is currently being resilvered.  The pool will
 continue to function, possibly in a degraded state.
 action: Wait for the resilver to complete.
  scrub: resilver in progress for 0h0m, 0.01% done, 52h51m to go
 config:
 
 NAME   STATE READ WRITE CKSUM
 tank   ONLINE   0 0 0
   raidz1   ONLINE   0 0 0
 ad8ONLINE   0 0 0  7.21M resilvered
 ad10   ONLINE   0 0 0  7.22M resilvered
 replacing  ONLINE   0 0 0
   ad12 ONLINE   0 0 0
   ad18 ONLINE   0 0 0  10.7M resilvered
 spares
   ad14 AVAIL   
 
 errors: No known data errors
 
 However, something must have gone wrong during the resilvering process and
 it now looks like this:
 
 mclane# zpool status
   pool: tank
  state: DEGRADED
 status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are
 unaffected. action: Determine if the device needs to be replaced, and
 clear the errors using 'zpool clear' or replace the device with 'zpool
 replace'. see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: resilver completed after 2h39m with 0 errors on Tue Jan 26
 14:00:00 2010 config:
 
 NAME   STATE READ WRITE CKSUM
 tank   DEGRADED 0 0 0
   raidz1   DEGRADED 0 0 0
 ad8ONLINE   0 0 0  975M resilvered
 ad10   ONLINE   0 0   142  974M resilvered
 replacing  DEGRADED 0 7.25M 0
   ad12 ONLINE   0 0 0
   ad18 REMOVED  0 1 0  79.4M resilvered
 spares
   ad14 AVAIL   
 
 errors: No known data errors
 
 
 What is going on here? ad18 obviously detached during the
 process. /var/log/messages just gives me
 
 Jan 26 11:23:33 mclane kernel: ad18: FAILURE - device detached
 
 Additionally ad10 obviously produced chksum errors. What do I do about the
 degraded replacing process? Can I terminate it somehow and maybe replace
 ad10 first? Any other hints?

I'm not sure how the above is supposed to work (I haven't personally
tried it), but:

1) Why didn't you offline the ad10 disk first?
   zpool offline tank ad10

2) How did you attach ad18?  Did you tell the system about it using
   atacontrol?  If so, what commands did you use?

3) Can you please provide uname -a output, as well as relevant dmesg
   output to show what kind of SATA controller you have, what's
   attached to what, etc.?

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS zpool replace problems

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 06:30:21 -0800 Jeremy Chadwick
free...@jdc.parodius.com wrote about Re: ZFS zpool replace problems:

JC I'm removing the In-Reply-To mail headers for this thread, as you've
JC now hijacked it for a different purpose.  Please don't do this; start
JC a new thread altogether.  :-)

Thanks. You're perfectly right, I should have done that.

JC I'm not sure how the above is supposed to work (I haven't personally
JC tried it), but:
JC 
JC 1) Why didn't you offline the ad10 disk first?
JCzpool offline tank ad10

Well, probably because I thought that zfs would simply handle the
situation. I just wanted to replace drive A with drive B, so this was
quite straight-forward for me.

JC 2) How did you attach ad18?  Did you tell the system about it using
JCatacontrol?  If so, what commands did you use?

Yes. The drives did not appear automatically (verified with atacontrol
list). Then I first tried reinit ata9, but that did not work out, so I did
a detach/attach for ata9, then the drive was there (with list and also
the device node appeared).

JC 3) Can you please provide uname -a output, as well as relevant dmesg
JCoutput to show what kind of SATA controller you have, what's
JCattached to what, etc.?

Of course (dmesg is not there anymore, I use pciconf -vl and
atacontrol instead):

ATA channel 0:
Master:  no device present
Slave:  acd0 Optiarc DVD RW AD-7540A/1.01 ATA/ATAPI revision 0
ATA channel 1:
Master:  no device present
Slave:   no device present
ATA channel 2:
Master:  ad4 ST380815AS/3.AAC SATA revision 2.x
Slave:   no device present
ATA channel 3:
Master:  ad6 ST380815AS/3.AAC SATA revision 2.x
Slave:   no device present
ATA channel 4:
Master:  ad8 WDC WD1000FYPS-01ZKB0/02.01B01 SATA revision 2.x
Slave:   no device present
ATA channel 5:
Master: ad10 WDC WD1000FYPS-01ZKB0/02.01B01 SATA revision 2.x
Slave:   no device present
ATA channel 6:
Master: ad12 WDC WD1000FYPS-01ZKB0/02.01B01 SATA revision 2.x
Slave:   no device present
ATA channel 7:
Master: ad14 WDC WD1000FYPS-01ZKB0/02.01B01 SATA revision 2.x
Slave:   no device present
ATA channel 8:
Master:  no device present
Slave:   no device present
ATA channel 9:
Master:  no device present
Slave:   no device present


FreeBSD mclane.rt.aei.uni-hannover.de 7.2-STABLE FreeBSD 7.2-STABLE #0:
Mon Sep  7 11:01:56 CEST 2009
r...@mclane.rt.aei.uni-hannover.de:/usr/obj/usr/src/sys/MCLANE.72  amd64

The first six drives (up to ad14) are connected onboard (Supermicro dual
opteron board with mcp55):

atap...@pci0:0:5:0: class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp'
device = 'MCP55 SATA/RAID Controller (MCP55S)'
class  = mass storage
subclass   = RAID
atap...@pci0:0:5:1: class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp'
device = 'MCP55 SATA/RAID Controller (MCP55S)'
class  = mass storage
subclass   = RAID
atap...@pci0:0:5:2: class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp'
device = 'MCP55 SATA/RAID Controller (MCP55S)'
class  = mass storage
subclass   = RAID

The other two (ad16 and ad18, the chassis has 8 slots and the last two
were only intended to be used in situtations like the one I have now) are
connected to an extra pci card:

atap...@pci0:3:6:0: class=0x010401 card=0x02409005 chip=0x02401095
rev=0x02 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology
Inc)' device = 'SATA/Raid controller(2XSATA150) (SIL3112)'
class  = mass storage
subclass   = RAID

Meanwhile I took out the ad18 drive again and tried to use a different
drive. But that was listed as UNAVAIL with corrupted data by zfs.
Probably it already branded the disk for resilvering and is looking for
exactly this one now. I also put in the disk which caused the problem
above again. The resilvering process started again, but very soon the
drive got detached again resulting in the same situation I described above.

Any help is greatly appreciated.


cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS zpool replace problems

2010-01-26 Thread Chuck Swiger
Hi--

On Jan 26, 2010, at 7:03 AM, Gerrit Kühn wrote:
[ ... ]
 atap...@pci0:3:6:0: class=0x010401 card=0x02409005 chip=0x02401095
 rev=0x02 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology
 Inc)' device = 'SATA/Raid controller(2XSATA150) (SIL3112)'
class  = mass storage
subclass   = RAID
 
 Meanwhile I took out the ad18 drive again and tried to use a different
 drive. But that was listed as UNAVAIL with corrupted data by zfs.

There's your problem-- the Silicon Image 3112/4 chips are remarkably buggy and 
exhibit data corruption:

  http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2005-08/0208.html

Regards,
-- 
-Chuck

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS zpool replace problems

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 08:15:27 -0800 Chuck Swiger cswi...@mac.com wrote
about Re: ZFS zpool replace problems:

CS  Meanwhile I took out the ad18 drive again and tried to use a
CS  different drive. But that was listed as UNAVAIL with corrupted
CS  data by zfs.

CS There's your problem-- the Silicon Image 3112/4 chips are remarkably
CS buggy and exhibit data corruption:

Hm, sure? I would expect the same behaviour (detaching) as with the first
drive if the controller was the reason in this case.

CS   http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2005-08/0208.html

I already thought about replacing the controller to get rid of the
detach-problem. However, I cannot do this online and I really would prefer
fixing the disk firmware problem first.
I could remove the hotspare drive ad14 and use this slot for putting in a
replacement disk. Is it possible to get ad18 out of zfs' replacing
process? Maybe by detaching the disk from the pool?


cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS zpool replace problems

2010-01-26 Thread Jeremy Chadwick
On Tue, Jan 26, 2010 at 08:15:27AM -0800, Chuck Swiger wrote:
 Hi--
 
 On Jan 26, 2010, at 7:03 AM, Gerrit Kühn wrote:
 [ ... ]
  atap...@pci0:3:6:0: class=0x010401 card=0x02409005 chip=0x02401095
  rev=0x02 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology
  Inc)' device = 'SATA/Raid controller(2XSATA150) (SIL3112)'
 class  = mass storage
 subclass   = RAID
  
  Meanwhile I took out the ad18 drive again and tried to use a different
  drive. But that was listed as UNAVAIL with corrupted data by zfs.
 
 There's your problem-- the Silicon Image 3112/4 chips are remarkably buggy 
 and exhibit data corruption:
 
   http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2005-08/0208.html

Well, to be fair, we can't be 100% certain he got bit by that bug.  It's
possible/likely, but we don't know for certain at this point.  We also
don't know what brand hard disks he had connected to ad16 and/or ad18.

Older Silicon Image controllers are known for. well, just read the
Wikipedia entry for details.

http://en.wikipedia.org/wiki/Silicon_Image_Inc.#Product_alerts

I don't have any experience with their newer models, but I'm told
they're significantly improved (throughput and reliability-wise).

But it is amusing, almost ironic, how Silicon Image bought CMD -- the
same company who was infamous for their CMD640 IDE controller causing
data corruption... back in 1995.

As others have stated already: Intel could make a fortune off of a
simple PCIe or PCI-X SATA controller card that's ICH9/ICH10-based.  I
guess there's more money in forcing people to buy motherboards with said
southbridge.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS zpool replace problems

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 08:27:37 -0800 Jeremy Chadwick
free...@jdc.parodius.com wrote about Re: ZFS zpool replace problems:

JC Well, to be fair, we can't be 100% certain he got bit by that bug.
JC It's possible/likely, but we don't know for certain at this point.  We
JC also don't know what brand hard disks he had connected to ad16 and/or
JC ad18.

The same as on the others (WD RE2GP), just with the updated firmware
(02.01B02 that is) to get rid of the lcc problem.

JC Older Silicon Image controllers are known for. well, just read the
JC Wikipedia entry for details.
JC http://en.wikipedia.org/wiki/Silicon_Image_Inc.#Product_alerts

I knew the card is not top of the line, but I didn't know that it
is /that/ bad. When I set up the system 1 or 2 years ago, I just thought
it might be nice to be able to use the two extra slots in case of any
drives having to be replaced or so and the card was just lying aroung
(well, maybe I have an idea now why nobody else wanted to use it :-).

I guess I will try to offline the hotspare slot (connected to the mcp55 on
the motherboard) and plug the replacement disk in there. Maybe zfs
recognizes it and picks up the resilvering there. Otherwise I'll have to
look into how to get rid of the degraded resilvering process and restart it
with the drive in the other slot.

JC As others have stated already: Intel could make a fortune off of a
JC simple PCIe or PCI-X SATA controller card that's ICH9/ICH10-based.

Indeed. I use these 8-channel Supermicro-Controller (I think I recommended
them some time ago here) with LSI chipset that work really nicely. But
the backet does not fit into standard slots and there is no PCI-X version.
I would certainly prefer a regular card by Intel.


cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS zpool replace problems

2010-01-26 Thread Jeremy Chadwick
On Tue, Jan 26, 2010 at 04:03:20PM +0100, Gerrit Kühn wrote:
 On Tue, 26 Jan 2010 06:30:21 -0800 Jeremy Chadwick
 free...@jdc.parodius.com wrote about Re: ZFS zpool replace problems:
 JC 2) How did you attach ad18?  Did you tell the system about it using
 JCatacontrol?  If so, what commands did you use?
 
 Yes. The drives did not appear automatically (verified with atacontrol
 list). Then I first tried reinit ata9, but that did not work out, so I did
 a detach/attach for ata9, then the drive was there (with list and also
 the device node appeared).

The procedure -- at least on Intel controllers in AHCI mode -- is:

- zpool offline pool disk
- atacontrol detach ataX (where X = channel associated with disk)
- Physically remove bad disk
- Physically insert new disk
- Wait 15 seconds for stuff to settle
- atacontrol attach ataX (where X = previous channel detached)
- zpool replace pool disk
- zpool online pool disk

reinit shouldn't be needed at all -- in fact, I've seen reinit cause
some craziness (even on Intel controllers), including a system deadlock,
but this was back during the RELENG_6 and RELENG_7 days.  Great
improvements have been made to ata(4) since then.

If you need me to validate the above procedure (it's been a while since
I've had to hot-swap a disk), I can do so.  I do have a 4-disk
Supermicro SuperServer 5015B-MTB (ICH9-based) sitting on my workbench
which I can test with.

 Meanwhile I took out the ad18 drive again and tried to use a different
 drive. But that was listed as UNAVAIL with corrupted data by zfs.
 Probably it already branded the disk for resilvering and is looking for
 exactly this one now. I also put in the disk which caused the problem
 above again. The resilvering process started again, but very soon the
 drive got detached again resulting in the same situation I described above.

It honestly sounds like hot-swapping is causing some chaos on your
system.  Are all of the controllers involved configured for AHCI?  If
not, physical removal/insertion should be done only when the system
power is off.  If so, mav@ or others may be able to help figure out
what's going on in the underlying ata(4) layer.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS zpool replace problems

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 08:46:19 -0800 Jeremy Chadwick
free...@jdc.parodius.com wrote about Re: ZFS zpool replace problems:

JC - zpool offline pool disk
JC - atacontrol detach ataX (where X = channel associated with disk)
JC - Physically remove bad disk
JC - Physically insert new disk
JC - Wait 15 seconds for stuff to settle
JC - atacontrol attach ataX (where X = previous channel detached)
JC - zpool replace pool disk
JC - zpool online pool disk

JC reinit shouldn't be needed at all -- in fact, I've seen reinit cause
JC some craziness (even on Intel controllers), including a system
JC deadlock, but this was back during the RELENG_6 and RELENG_7 days.
JC Great improvements have been made to ata(4) since then.

Thanks for pointing that out. I would have went exactly this way, if I did
not have the extra slots or one of the drives was actually faulty. But in
this case I just wanted to replace every drive on-by-one and (at least I
thought) I had extra slots, so I did not want to give up the redundancy
during the replacement (knowing very well that the drives to be replaced
are already beyond the specification of wd due to the load-cycle bug).

JC If you need me to validate the above procedure (it's been a while since
JC I've had to hot-swap a disk), I can do so.  I do have a 4-disk
JC Supermicro SuperServer 5015B-MTB (ICH9-based) sitting on my workbench
JC which I can test with.

I'm quite sure this will work fine. I just don't know how to get rid of
the degraded replacement zfs sees.

JC It honestly sounds like hot-swapping is causing some chaos on your
JC system.  Are all of the controllers involved configured for AHCI?  

I think so. How could I verify this?


cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS zpool replace problems

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 08:59:27 -0800 Chuck Swiger cswi...@mac.com wrote
about Re: ZFS zpool replace problems:

CS As a general matter of maintaining RAID systems, however, the approach
CS to upgrading drive firmware on members of a RAID array should be to
CS take down the entire container and offline the drives, update one
CS drive, test it (via SMART self-test and read-only checksum comparison
CS or similar), and then proceed to update all of the drives (preferably
CS doing the SMART self-test for each, if time allows) before returning
CS them to the RAID container and onlining them.

Well, I had several spare drives sitting on the shelf. So I updated the
firmware of these spare drives and now want to replace the drives with the
old firmware by new new ones one-by-one. Taking the system offline for
longer than a few minutes is not really an option. I'd rather roll in a
new machine to take over the job in that case.

CS Pulling individual drives from a RAID set while live and updating the
CS firmware one at a time is not an approach I would take-- running with
CS mixed firmware versions doesn't thrill me, and I know of multiple
CS cases where someone made a mistake reconnecting a drive with the wrong
CS SCSI id or something like that, taking out a second drive while the
CS RAID was not redundant, resulting in massive data corruption or even
CS total loss of the RAID contents.

This scenario was exactly the reason why I plugged in the new drive to an
extra slot and asked zfs to replace it with an old one. Well, I did not
know what kind of fiasco the controller for this extra slot would turn out
to be - otherwise I would have used the hot-spare slot for this in the
first place.


cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS zpool replace problems

2010-01-26 Thread Chuck Swiger
Hi--

On Jan 26, 2010, at 8:25 AM, Gerrit Kühn wrote:
 CS There's your problem-- the Silicon Image 3112/4 chips are remarkably
 CS buggy and exhibit data corruption:
 
 Hm, sure?

I'm sure that the SII 3112 is buggy.
I am not sure that it is the primary or only cause of the problems you describe.

[ ... ]
 I already thought about replacing the controller to get rid of the
 detach-problem. However, I cannot do this online and I really would prefer
 fixing the disk firmware problem first.
 I could remove the hotspare drive ad14 and use this slot for putting in a
 replacement disk. Is it possible to get ad18 out of zfs' replacing
 process? Maybe by detaching the disk from the pool?

I don't know enough about ZFS to provide specific advice for recovery attempts 
(aside from the notion of restoring your data from a backup instead). 

As a general matter of maintaining RAID systems, however, the approach to 
upgrading drive firmware on members of a RAID array should be to take down the 
entire container and offline the drives, update one drive, test it (via SMART 
self-test and read-only checksum comparison or similar), and then proceed to 
update all of the drives (preferably doing the SMART self-test for each, if 
time allows) before returning them to the RAID container and onlining them.

Pulling individual drives from a RAID set while live and updating the firmware 
one at a time is not an approach I would take-- running with mixed firmware 
versions doesn't thrill me, and I know of multiple cases where someone made a 
mistake reconnecting a drive with the wrong SCSI id or something like that, 
taking out a second drive while the RAID was not redundant, resulting in 
massive data corruption or even total loss of the RAID contents.

Regards,
-- 
-Chuck

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org