[zfs-discuss] Zfs ignoring spares?

2010-12-05 Thread Roy Sigurd Karlsbakk
Hi all

I have installed a new server with 77 2TB drives in 11 7-drive RAIDz2 VDEVs, 
all on WD Black drives. Now, it seems two of these drives were bad, one of them 
had a bunch of errors, the other was very slow. After zfs offlining these and 
then zfs replacing them with online spares, resilver ended and I thought it'd 
be ok. Appearently not. Albeit the resilver succeeds, the pool status is still 
degraded. A test with iozone also shows that the two degraded VDEVs are not 
used (much) during the test. See below for zpool -xv output.

I have done a few test on another system, and it showed, with raidz2 with a 
spare, removing one drive, waiting for it to resilver, same degraded status. 
zpool clear doesn't help. Removing another drive (not the spare) leaves it in 
the same degraded status. Removing a third (which should work, since the spare 
is in action) faults the pool.

Can someone help me how to fix this, or should I file a bug about this?

roy

r...@prv-backup:~# zpool status -xv
  pool: pbpool
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
 scan: resilvered 385M in 0h12m with 0 errors on Sun Dec  5 21:06:38 2010
config:

NAME   STATE READ WRITE CKSUM
pbpool DEGRADED 0 0 0
  raidz2-0 ONLINE   0 0 0
c8t2d0 ONLINE   0 0 0
c8t3d0 ONLINE   0 0 0
c8t4d0 ONLINE   0 0 0
c8t5d0 ONLINE   0 0 0
c8t6d0 ONLINE   0 0 0
c8t7d0 ONLINE   0 0 0
c8t8d0 ONLINE   0 0 0
  raidz2-1 DEGRADED 0 0 0
c8t9d0 ONLINE   0 0 0
c8t10d0ONLINE   0 0 0
c8t11d0ONLINE   0 0 0
c8t12d0ONLINE   0 0 0
spare-4DEGRADED 0 0 0
  c8t13d0  OFFLINE  0 0 0
  c4t43d0  ONLINE   0 0 0
c8t14d0ONLINE   0 0 0
c8t15d0ONLINE   0 0 0
  raidz2-2 ONLINE   0 0 0
c8t16d0ONLINE   0 0 0
c8t17d0ONLINE   0 0 0
c8t18d0ONLINE   0 0 0
c8t19d0ONLINE   0 0 0
c8t20d0ONLINE   0 0 0
c8t21d0ONLINE   0 0 0
c8t22d0ONLINE   0 0 0
  raidz2-3 ONLINE   0 0 0
c8t23d0ONLINE   0 0 0
c8t24d0ONLINE   0 0 0
c8t25d0ONLINE   0 0 0
c8t26d0ONLINE   0 0 0
c8t27d0ONLINE   0 0 0
c8t28d0ONLINE   0 0 0
c8t29d0ONLINE   0 0 0
  raidz2-4 ONLINE   0 0 0
c8t30d0ONLINE   0 0 0
c8t31d0ONLINE   0 0 0
c8t32d0ONLINE   0 0 0
c8t33d0ONLINE   0 0 0
c8t34d0ONLINE   0 0 0
c8t35d0ONLINE   0 0 0
c4t0d0 ONLINE   0 0 0
  raidz2-5 ONLINE   0 0 0
c4t1d0 ONLINE   0 0 0
c4t2d0 ONLINE   0 0 0
c4t3d0 ONLINE   0 0 0
c4t4d0 ONLINE   0 0 0
c4t5d0 ONLINE   0 0 0
c4t6d0 ONLINE   0 0 0
c4t7d0 ONLINE   0 0 0
  raidz2-6 ONLINE   0 0 0
c4t8d0 ONLINE   0 0 0
c4t9d0 ONLINE   0 0 0
c4t10d0ONLINE   0 0 0
c4t11d0ONLINE   0 0 0
c4t12d0ONLINE   0 0 0
c4t13d0ONLINE   0 0 0
c4t14d0ONLINE   0 0 0
  raidz2-7 DEGRADED 0 0 0
c4t15d0ONLINE   0 0 0
c4t16d0ONLINE   0 0 0
spare-2DEGRADED 0 0 0
  c4t17d0  OFFLINE  0 0 0
  c4t44d0  ONLINE   0 0 0
c4t18d0ONLINE   0 0 0
c4t19d0ONLINE   0 0 0
c4t20d0ONLINE   0 0 0
c4t21d0ONLINE   0 0 0
  raidz2-8 ONLINE   0 0 0
c4t22d0ONLINE   0 0 0

Re: [zfs-discuss] Zfs ignoring spares?

2010-12-05 Thread Tim Cook
On Sun, Dec 5, 2010 at 2:22 PM, Roy Sigurd Karlsbakk r...@karlsbakk.netwrote:

 Hi all

 I have installed a new server with 77 2TB drives in 11 7-drive RAIDz2
 VDEVs, all on WD Black drives. Now, it seems two of these drives were bad,
 one of them had a bunch of errors, the other was very slow. After zfs
 offlining these and then zfs replacing them with online spares, resilver
 ended and I thought it'd be ok. Appearently not. Albeit the resilver
 succeeds, the pool status is still degraded. A test with iozone also shows
 that the two degraded VDEVs are not used (much) during the test. See below
 for zpool -xv output.

 I have done a few test on another system, and it showed, with raidz2 with a
 spare, removing one drive, waiting for it to resilver, same degraded status.
 zpool clear doesn't help. Removing another drive (not the spare) leaves it
 in the same degraded status. Removing a third (which should work, since the
 spare is in action) faults the pool.

 Can someone help me how to fix this, or should I file a bug about this?

 roy

 r...@prv-backup:~# zpool status -xv
  pool: pbpool
  state: DEGRADED
 status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
 action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
  scan: resilvered 385M in 0h12m with 0 errors on Sun Dec  5 21:06:38 2010
 config:

NAME   STATE READ WRITE CKSUM
pbpool DEGRADED 0 0 0
  raidz2-0 ONLINE   0 0 0
c8t2d0 ONLINE   0 0 0
c8t3d0 ONLINE   0 0 0
c8t4d0 ONLINE   0 0 0
c8t5d0 ONLINE   0 0 0
c8t6d0 ONLINE   0 0 0
c8t7d0 ONLINE   0 0 0
c8t8d0 ONLINE   0 0 0
  raidz2-1 DEGRADED 0 0 0
c8t9d0 ONLINE   0 0 0
c8t10d0ONLINE   0 0 0
c8t11d0ONLINE   0 0 0
c8t12d0ONLINE   0 0 0
spare-4DEGRADED 0 0 0
  c8t13d0  OFFLINE  0 0 0
  c4t43d0  ONLINE   0 0 0
c8t14d0ONLINE   0 0 0
c8t15d0ONLINE   0 0 0
  raidz2-2 ONLINE   0 0 0
c8t16d0ONLINE   0 0 0
c8t17d0ONLINE   0 0 0
c8t18d0ONLINE   0 0 0
c8t19d0ONLINE   0 0 0
c8t20d0ONLINE   0 0 0
c8t21d0ONLINE   0 0 0
c8t22d0ONLINE   0 0 0
  raidz2-3 ONLINE   0 0 0
c8t23d0ONLINE   0 0 0
c8t24d0ONLINE   0 0 0
c8t25d0ONLINE   0 0 0
c8t26d0ONLINE   0 0 0
c8t27d0ONLINE   0 0 0
c8t28d0ONLINE   0 0 0
c8t29d0ONLINE   0 0 0
  raidz2-4 ONLINE   0 0 0
c8t30d0ONLINE   0 0 0
c8t31d0ONLINE   0 0 0
c8t32d0ONLINE   0 0 0
c8t33d0ONLINE   0 0 0
c8t34d0ONLINE   0 0 0
c8t35d0ONLINE   0 0 0
c4t0d0 ONLINE   0 0 0
  raidz2-5 ONLINE   0 0 0
c4t1d0 ONLINE   0 0 0
c4t2d0 ONLINE   0 0 0
c4t3d0 ONLINE   0 0 0
c4t4d0 ONLINE   0 0 0
c4t5d0 ONLINE   0 0 0
c4t6d0 ONLINE   0 0 0
c4t7d0 ONLINE   0 0 0
  raidz2-6 ONLINE   0 0 0
c4t8d0 ONLINE   0 0 0
c4t9d0 ONLINE   0 0 0
c4t10d0ONLINE   0 0 0
c4t11d0ONLINE   0 0 0
c4t12d0ONLINE   0 0 0
c4t13d0ONLINE   0 0 0
c4t14d0ONLINE   0 0 0
  raidz2-7 DEGRADED 0 0 0
c4t15d0ONLINE   0 0 0
c4t16d0ONLINE   0 0 0
spare-2DEGRADED 0 0 0
  c4t17d0  OFFLINE  0 0 0
  c4t44d0  ONLINE   0 0 0
c4t18d0ONLINE   0 0 0
c4t19d0ONLINE   0 0 0
c4t20d0ONLINE   0 0 0
c4t21d0ONLINE   0 0 0
  

Re: [zfs-discuss] Zfs ignoring spares?

2010-12-05 Thread Roy Sigurd Karlsbakk
 Hot spares are dedicated spares in the ZFS world. Until you replace
 the actual bad drives, you will be running in a degraded state. The
 idea is that spares are only used in an emergency. You are degraded
 until your spares are no longer in use. 

 --Tim 

Thanks for the clarification. Wouldn't it be nice if ZFS could fail over
to a spare and then allow the replacement as the new spare, as with what
is done with most commercial hardware RAIDs?

Vennlige hilsener / Best regards 

roy 
-- 
Roy Sigurd Karlsbakk 
(+47) 97542685 
r...@karlsbakk.net 
http://blogg.karlsbakk.net/ 
-- 
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs ignoring spares?

2010-12-05 Thread Mark Musante







On 5 Dec 2010, at 16:06, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote:

 Hot spares are dedicated spares in the ZFS world. Until you replace
 the actual bad drives, you will be running in a degraded state. The
 idea is that spares are only used in an emergency. You are degraded
 until your spares are no longer in use. 
 
 --Tim 
 
 Thanks for the clarification. Wouldn't it be nice if ZFS could fail over
 to a spare and then allow the replacement as the new spare, as with what
 is done with most commercial hardware RAIDs?

If you use zpool detach to remove the disk that went bad, the spare is 
promoted to a proper member of the pool. Then, when you replace the bad disk, 
you can use zpool add to add it into the pool as a new spare.

Admittedly, this is all a manual procedure. It's unclear if you were asking for 
this to be fully automated.


 
 Vennlige hilsener / Best regards 
 
 roy 
 -- 
 Roy Sigurd Karlsbakk 
 (+47) 97542685 
 r...@karlsbakk.net 
 http://blogg.karlsbakk.net/ 
 -- 
 I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
 er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
 idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
 relevante synonymer på norsk. 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs ignoring spares?

2010-12-05 Thread Roy Sigurd Karlsbakk
  Thanks for the clarification. Wouldn't it be nice if ZFS could fail
  over
  to a spare and then allow the replacement as the new spare, as with
  what
  is done with most commercial hardware RAIDs?
 
 If you use zpool detach to remove the disk that went bad, the spare
 is promoted to a proper member of the pool. Then, when you replace the
 bad disk, you can use zpool add to add it into the pool as a new
 spare.
 
 Admittedly, this is all a manual procedure. It's unclear if you were
 asking for this to be fully automated.

Thanks a bunch. I wasn't aware of the possibility of using detach except for 
mirrors (as I beleive the manual states). Just tried to detach the two bad 
devices, and the pool is back to ONLINE. I'll restart the iozone testing to see 
if all VDEVs are used this time :)

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss