[zfs-discuss] Zfs ignoring spares?
Hi all I have installed a new server with 77 2TB drives in 11 7-drive RAIDz2 VDEVs, all on WD Black drives. Now, it seems two of these drives were bad, one of them had a bunch of errors, the other was very slow. After zfs offlining these and then zfs replacing them with online spares, resilver ended and I thought it'd be ok. Appearently not. Albeit the resilver succeeds, the pool status is still degraded. A test with iozone also shows that the two degraded VDEVs are not used (much) during the test. See below for zpool -xv output. I have done a few test on another system, and it showed, with raidz2 with a spare, removing one drive, waiting for it to resilver, same degraded status. zpool clear doesn't help. Removing another drive (not the spare) leaves it in the same degraded status. Removing a third (which should work, since the spare is in action) faults the pool. Can someone help me how to fix this, or should I file a bug about this? roy r...@prv-backup:~# zpool status -xv pool: pbpool state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 385M in 0h12m with 0 errors on Sun Dec 5 21:06:38 2010 config: NAME STATE READ WRITE CKSUM pbpool DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c8t8d0 ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 c8t9d0 ONLINE 0 0 0 c8t10d0ONLINE 0 0 0 c8t11d0ONLINE 0 0 0 c8t12d0ONLINE 0 0 0 spare-4DEGRADED 0 0 0 c8t13d0 OFFLINE 0 0 0 c4t43d0 ONLINE 0 0 0 c8t14d0ONLINE 0 0 0 c8t15d0ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 c8t16d0ONLINE 0 0 0 c8t17d0ONLINE 0 0 0 c8t18d0ONLINE 0 0 0 c8t19d0ONLINE 0 0 0 c8t20d0ONLINE 0 0 0 c8t21d0ONLINE 0 0 0 c8t22d0ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 c8t23d0ONLINE 0 0 0 c8t24d0ONLINE 0 0 0 c8t25d0ONLINE 0 0 0 c8t26d0ONLINE 0 0 0 c8t27d0ONLINE 0 0 0 c8t28d0ONLINE 0 0 0 c8t29d0ONLINE 0 0 0 raidz2-4 ONLINE 0 0 0 c8t30d0ONLINE 0 0 0 c8t31d0ONLINE 0 0 0 c8t32d0ONLINE 0 0 0 c8t33d0ONLINE 0 0 0 c8t34d0ONLINE 0 0 0 c8t35d0ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 raidz2-5 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 raidz2-6 ONLINE 0 0 0 c4t8d0 ONLINE 0 0 0 c4t9d0 ONLINE 0 0 0 c4t10d0ONLINE 0 0 0 c4t11d0ONLINE 0 0 0 c4t12d0ONLINE 0 0 0 c4t13d0ONLINE 0 0 0 c4t14d0ONLINE 0 0 0 raidz2-7 DEGRADED 0 0 0 c4t15d0ONLINE 0 0 0 c4t16d0ONLINE 0 0 0 spare-2DEGRADED 0 0 0 c4t17d0 OFFLINE 0 0 0 c4t44d0 ONLINE 0 0 0 c4t18d0ONLINE 0 0 0 c4t19d0ONLINE 0 0 0 c4t20d0ONLINE 0 0 0 c4t21d0ONLINE 0 0 0 raidz2-8 ONLINE 0 0 0 c4t22d0ONLINE 0 0 0
Re: [zfs-discuss] Zfs ignoring spares?
On Sun, Dec 5, 2010 at 2:22 PM, Roy Sigurd Karlsbakk r...@karlsbakk.netwrote: Hi all I have installed a new server with 77 2TB drives in 11 7-drive RAIDz2 VDEVs, all on WD Black drives. Now, it seems two of these drives were bad, one of them had a bunch of errors, the other was very slow. After zfs offlining these and then zfs replacing them with online spares, resilver ended and I thought it'd be ok. Appearently not. Albeit the resilver succeeds, the pool status is still degraded. A test with iozone also shows that the two degraded VDEVs are not used (much) during the test. See below for zpool -xv output. I have done a few test on another system, and it showed, with raidz2 with a spare, removing one drive, waiting for it to resilver, same degraded status. zpool clear doesn't help. Removing another drive (not the spare) leaves it in the same degraded status. Removing a third (which should work, since the spare is in action) faults the pool. Can someone help me how to fix this, or should I file a bug about this? roy r...@prv-backup:~# zpool status -xv pool: pbpool state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 385M in 0h12m with 0 errors on Sun Dec 5 21:06:38 2010 config: NAME STATE READ WRITE CKSUM pbpool DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c8t8d0 ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 c8t9d0 ONLINE 0 0 0 c8t10d0ONLINE 0 0 0 c8t11d0ONLINE 0 0 0 c8t12d0ONLINE 0 0 0 spare-4DEGRADED 0 0 0 c8t13d0 OFFLINE 0 0 0 c4t43d0 ONLINE 0 0 0 c8t14d0ONLINE 0 0 0 c8t15d0ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 c8t16d0ONLINE 0 0 0 c8t17d0ONLINE 0 0 0 c8t18d0ONLINE 0 0 0 c8t19d0ONLINE 0 0 0 c8t20d0ONLINE 0 0 0 c8t21d0ONLINE 0 0 0 c8t22d0ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 c8t23d0ONLINE 0 0 0 c8t24d0ONLINE 0 0 0 c8t25d0ONLINE 0 0 0 c8t26d0ONLINE 0 0 0 c8t27d0ONLINE 0 0 0 c8t28d0ONLINE 0 0 0 c8t29d0ONLINE 0 0 0 raidz2-4 ONLINE 0 0 0 c8t30d0ONLINE 0 0 0 c8t31d0ONLINE 0 0 0 c8t32d0ONLINE 0 0 0 c8t33d0ONLINE 0 0 0 c8t34d0ONLINE 0 0 0 c8t35d0ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 raidz2-5 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 raidz2-6 ONLINE 0 0 0 c4t8d0 ONLINE 0 0 0 c4t9d0 ONLINE 0 0 0 c4t10d0ONLINE 0 0 0 c4t11d0ONLINE 0 0 0 c4t12d0ONLINE 0 0 0 c4t13d0ONLINE 0 0 0 c4t14d0ONLINE 0 0 0 raidz2-7 DEGRADED 0 0 0 c4t15d0ONLINE 0 0 0 c4t16d0ONLINE 0 0 0 spare-2DEGRADED 0 0 0 c4t17d0 OFFLINE 0 0 0 c4t44d0 ONLINE 0 0 0 c4t18d0ONLINE 0 0 0 c4t19d0ONLINE 0 0 0 c4t20d0ONLINE 0 0 0 c4t21d0ONLINE 0 0 0
Re: [zfs-discuss] Zfs ignoring spares?
Hot spares are dedicated spares in the ZFS world. Until you replace the actual bad drives, you will be running in a degraded state. The idea is that spares are only used in an emergency. You are degraded until your spares are no longer in use. --Tim Thanks for the clarification. Wouldn't it be nice if ZFS could fail over to a spare and then allow the replacement as the new spare, as with what is done with most commercial hardware RAIDs? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs ignoring spares?
On 5 Dec 2010, at 16:06, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote: Hot spares are dedicated spares in the ZFS world. Until you replace the actual bad drives, you will be running in a degraded state. The idea is that spares are only used in an emergency. You are degraded until your spares are no longer in use. --Tim Thanks for the clarification. Wouldn't it be nice if ZFS could fail over to a spare and then allow the replacement as the new spare, as with what is done with most commercial hardware RAIDs? If you use zpool detach to remove the disk that went bad, the spare is promoted to a proper member of the pool. Then, when you replace the bad disk, you can use zpool add to add it into the pool as a new spare. Admittedly, this is all a manual procedure. It's unclear if you were asking for this to be fully automated. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs ignoring spares?
Thanks for the clarification. Wouldn't it be nice if ZFS could fail over to a spare and then allow the replacement as the new spare, as with what is done with most commercial hardware RAIDs? If you use zpool detach to remove the disk that went bad, the spare is promoted to a proper member of the pool. Then, when you replace the bad disk, you can use zpool add to add it into the pool as a new spare. Admittedly, this is all a manual procedure. It's unclear if you were asking for this to be fully automated. Thanks a bunch. I wasn't aware of the possibility of using detach except for mirrors (as I beleive the manual states). Just tried to detach the two bad devices, and the pool is back to ONLINE. I'll restart the iozone testing to see if all VDEVs are used this time :) Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss