Re: [zfs-discuss] trouble replacing spare disk
Hi I was a little short earlier today, but then... First of all, using 40+ drives in a single VDEV on RAIDz1 is a little like BASE jumping with an old, round, parachute without a reserve, under nasty weather conditions; not what I'd recommend. What you see below is 2 spares in use plus a spare that has been flagged in use, and then failed. With RAIDz1, you can lose a single drive, and as far as I can see, you are now down on two dead ones, meaning you've probably lost the pool. If you find a way to recover it, make a good backup. If you manage to back it up, or already have a backup, recreate the pool in smaller VDEVs, and preferably with RAIDz2. The ZFS Best Practices document at http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide is contains good reading about this and other subjects. roy - Original Message - Hi, I have a SunFire X4540 with 19TB in a RAID-Z configuration; here's my zpool status: pool: raid state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: resilver in progress for 84h11m, 99.47% done, 0h27m to go config: NAME STATE READ WRITE CKSUM raid UNAVAIL 0 0 451 insufficient replicas raidz1 UNAVAIL 0 0 902 insufficient replicas c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t3d0 UNAVAIL 472 94 0 cannot open c5t3d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 spare DEGRADED 7 0 66.8M c5t2d0 FAULTED 11 2 0 too many errors replacing DEGRADED 0 0 0 c5t7d0 FAULTED 13 0 0 too many errors c5t6d0 ONLINE 0 0 0 202G resilvered c0t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c0t1d0 FAULTED 0 0 0 too many errors c4t7d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c4t7d0 INUSE currently in use c5t7d0 INUSE currently in use c5t6d0 INUSE currently in use c5t4d0 AVAIL errors: 911 data errors, use '-v' for a list It looks like the resilver has got stuck; Oracle have sent out a replacement disk today and are asking me to replace c5t7d0. If I am understanding the documentation correctly, I believe I need to do the following: zpool offline raid c5t7d0 cfgadm -c unconfigure c5::dsk/c5t7d0 before physically replacing the disk. However, I get the following messages when trying to do this: # zpool offline raid c5t7d0 cannot offline c5t7d0: device is reserved as a hot spare # cfgadm -c unconfigure c5::dsk/c5t7d0 cfgadm: Hardware specific failure: failed to unconfigure SCSI device: Device busy I also tried a detach: # zpool detach raid c5t7d0 cannot detach c5t7d0: pool I/O is currently suspended And I also tried using the last available spare to try and free up the disk I need to replace: # zpool replace raid c5t2d0 c5t4d0 Cannot replace c5t2d0 with c5t4d0: device has already been replaced with a spare I am new to ZFS, how would I go about safely removing the affected drive in the software, before physically replacing it? I'm also not sure at exactly which juncture to do a 'zpool clear' and 'zpool scrub'? I'd appreciate any guidance - thanks in advance, Mark Mark Mahabir Systems Manager, X-Ray and Observational Astronomy Dept. of Physics & Astronomy, University of Leicester, LE1 7RH Tel: +44(0)116 252 5652 email: mark.maha...@leicester.ac.uk Elite Without Being Elitist Times Higher Awards Winner 2007, 2008, 2009, 2010 Follow us on Twitter http://twitter.com/uniofleicsnews ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] trouble replacing spare disk
Sorry, but what exactly were you thinking of when putting 40+ drives in a single RAIDz1 VDEV? roy - Original Message - Hi, I have a SunFire X4540 with 19TB in a RAID-Z configuration; here's my zpool status: pool: raid state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: resilver in progress for 84h11m, 99.47% done, 0h27m to go config: NAME STATE READ WRITE CKSUM raid UNAVAIL 0 0 451 insufficient replicas raidz1 UNAVAIL 0 0 902 insufficient replicas c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t3d0 UNAVAIL 472 94 0 cannot open c5t3d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 spare DEGRADED 7 0 66.8M c5t2d0 FAULTED 11 2 0 too many errors replacing DEGRADED 0 0 0 c5t7d0 FAULTED 13 0 0 too many errors c5t6d0 ONLINE 0 0 0 202G resilvered c0t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c0t1d0 FAULTED 0 0 0 too many errors c4t7d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c4t7d0 INUSE currently in use c5t7d0 INUSE currently in use c5t6d0 INUSE currently in use c5t4d0 AVAIL errors: 911 data errors, use '-v' for a list It looks like the resilver has got stuck; Oracle have sent out a replacement disk today and are asking me to replace c5t7d0. If I am understanding the documentation correctly, I believe I need to do the following: zpool offline raid c5t7d0 cfgadm -c unconfigure c5::dsk/c5t7d0 before physically replacing the disk. However, I get the following messages when trying to do this: # zpool offline raid c5t7d0 cannot offline c5t7d0: device is reserved as a hot spare # cfgadm -c unconfigure c5::dsk/c5t7d0 cfgadm: Hardware specific failure: failed to unconfigure SCSI device: Device busy I also tried a detach: # zpool detach raid c5t7d0 cannot detach c5t7d0: pool I/O is currently suspended And I also tried using the last available spare to try and free up the disk I need to replace: # zpool replace raid c5t2d0 c5t4d0 Cannot replace c5t2d0 with c5t4d0: device has already been replaced with a spare I am new to ZFS, how would I go about safely removing the affected drive in the software, before physically replacing it? I'm also not sure at exactly which juncture to do a 'zpool clear' and 'zpool scrub'? I'd appreciate any guidance - thanks in advance, Mark Mark Mahabir Systems Manager, X-Ray and Observational Astronomy Dept. of Physics & Astronomy, University of Leicester, LE1 7RH Tel: +44(0)116 252 5652 email: mark.maha...@leicester.ac.uk Elite Without Being Elitist Times Higher Awards Winner 2007, 2008, 2009, 2010 Follow us on Twitter http://twitter.com/uniofleicsnews ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] trouble replacing spare disk
Hi, I have a SunFire X4540 with 19TB in a RAID-Z configuration; here's my zpool status: pool: raid state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: resilver in progress for 84h11m, 99.47% done, 0h27m to go config: NAME STATE READ WRITE CKSUM raid UNAVAIL 0 0 451 insufficient replicas raidz1 UNAVAIL 0 0 902 insufficient replicas c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t3d0 UNAVAIL47294 0 cannot open c5t3d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c3t7d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 spareDEGRADED 7 0 66.8M c5t2d0 FAULTED 11 2 0 too many errors replacing DEGRADED 0 0 0 c5t7d0 FAULTED 13 0 0 too many errors c5t6d0 ONLINE 0 0 0 202G resilvered c0t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c3t6d0 ONLINE 0 0 0 spareDEGRADED 0 0 0 c0t1d0 FAULTED 0 0 0 too many errors c4t7d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c4t7d0 INUSE currently in use c5t7d0 INUSE currently in use c5t6d0 INUSE currently in use c5t4d0 AVAIL errors: 911 data errors, use '-v' for a list It looks like the resilver has got stuck; Oracle have sent out a replacement disk today and are asking me to replace c5t7d0. If I am understanding the documentation correctly, I believe I need to do the following: zpool offline raid c5t7d0 cfgadm -c unconfigure c5::dsk/c5t7d0 before physically replacing the disk. However, I get the following messages when trying to do this: # zpool offline raid c5t7d0 cannot offline c5t7d0: device is reserved as a hot spare # cfgadm -c unconfigure c5::dsk/c5t7d0 cfgadm: Hardware specific failure: failed to unconfigure SCSI device: Device busy I also tried a detach: # zpool detach raid c5t7d0 cannot detach c5t7d0: pool I/O is currently suspended And I also tried using the last available spare to try and free up the disk I need to replace: # zpool replace raid c5t2d0 c5t4d0 Cannot replace c5t2d0 with c5t4d0: device has already been replaced with a spare I am new to ZFS, how would I go about safely removing the affected drive in the software, before physically replacing it? I'm also not sure at exactly which juncture to do a 'zpool clear' and 'zpool scrub'? I'd appreciate any guidance - thanks in advance, Mark Mark Mahabir Systems Manager, X-Ray and Observational Astronomy Dept. of Physics & Astronomy, University of Leicester, LE1 7RH Tel: +44(0)116 252 5652 email: mark.maha...@leicester.ac.uk Elite Without Being Elitist Times Higher Awards Winner 2007, 2008, 2009, 2010 Follow us on Twitter http://twitter.com/uniofleicsnews ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss