Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-11 Thread Daniel Carosone
On Sun, Jan 08, 2012 at 06:25:05PM -0800, Richard Elling wrote:
 ZIL makes zero impact on resilver.  I'll have to check to see if L2ARC is 
 still used, but
 due to the nature of the ARC design, read-once workloads like backup or 
 resilver do 
 not tend to negatively impact frequently used data.

This is true, in a strict sense (they don't help resilver itself) but
it misses the point. They (can) help the system, when resilver is
underway. 

ZIL helps reduce the impact busy resilvering disks have on other system
operation (sync write syscalls and vfs ops by apps).  L2ARC, likewise
for reads.  Both can hide the latency increases that resilvering iops
cause for the disks (and which the throttle you mentioned also
attempts to minimise). 

--
Dan.


pgpHuumFi1QZ5.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-11 Thread Daniel Carosone
On Thu, Jan 12, 2012 at 03:05:32PM +1100, Daniel Carosone wrote:
 On Sun, Jan 08, 2012 at 06:25:05PM -0800, Richard Elling wrote:
  ZIL makes zero impact on resilver.  I'll have to check to see if L2ARC is 
  still used, but
  due to the nature of the ARC design, read-once workloads like backup or 
  resilver do 
  not tend to negatively impact frequently used data.
 
 This is true, in a strict sense (they don't help resilver itself) but
 it misses the point. They (can) help the system, when resilver is
 underway. 
 
 ZIL helps reduce the impact busy resilvering disks have on other system

Well, since I'm being strict and picky, I should of course say ZIL-on-slog.

 operation (sync write syscalls and vfs ops by apps).  L2ARC, likewise
 for reads.  Both can hide the latency increases that resilvering iops
 cause for the disks (and which the throttle you mentioned also
 attempts to minimise). 

--
Dan.


pgpJr64AafDRB.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-10 Thread Karl Wagner
On Sun, January 8, 2012 00:28, Bob Friesenhahn wrote:

 I think that I would also be interested in a system which uses the
 so-called spare disks for more protective redundancy but then reduces
 that protective redundancy in order to use that disk to replace a
 failed disk or to automatically enlarge the pool.

 For example, a pool could start out with four-way mirroring when there
 is little data in the pool.  When the pool becomes more full, mirror
 devices are automatically removed (from existing vdevs), and used to
 add more vdevs.  Eventually a limit would be hit so that no more
 mirrors are allowed to be removed.

 Obviously this approach works with simple mirrors but not for raidz.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I actually disagree about raidz. I have often thought that a dynamic
raidz would be a great feature.

For instance, you have a 4-way raidz. What you are saying is you want the
array to survive the loss of a single drive. So, from an empty vdev, it
starts by writing 2 copies of each block, effectively creating a pair of
mirrors. These are quicker to write and quicker to resilver than parity,
and you would likely get a read speed increase too.

As the vdev starts to get full, it starts using a parity based redundancy,
and converting older data to this as well. Performance drops a bit, but
it happens slowly. In addition, any older blocks not yet converted are
still quicker to read and resilver.

This is only a theory, but it is certainly something which could be
considered. It would probably take a lot of rewriting of the raidz code,
though.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-08 Thread Jim Klimov

First of all, I would like to thank Bob, Richard and Tim for
at least taking time to look at this proposal and responding ;)

It is also encouraging to see that 2 of 3 responders consider
this idea at least worth pondering and discussng, as it appeals
to their direct interest. Even Richard was not dismissive of it ;)

Finally, as Tim was right to note, I am not a kernel developer
(and won't become one as good as those present on this list).
Of course, I could pull the blanket onto my side and say
that I'd try to write that code myself... but it would
probably be a long wait, like that for BP rewrite - because,
I already have quite a few commitments and responsibilities
as an admin and recently as a parent (yay!)

So, I guess, my piece of the pie is currently limited to RFEs
and bug reports... and working in IT for a software development
company, I believe (or hope) that's not a useless part of the
process ;)

I do believe that ZFS technology is amazing - despite some
shortcomings that are still present - and I do want to see
it flourish... ASAP! :^)

//Jim


2012-01-08 7:15, Tim Cook wrote:



On Sat, Jan 7, 2012 at 7:37 PM, Richard Elling richard.ell...@gmail.com
mailto:richard.ell...@gmail.com wrote:

Hi Jim,

On Jan 6, 2012, at 3:33 PM, Jim Klimov wrote:

  Hello all,
 
  I have a new idea up for discussion.
 

...



I disagree.  Dedicated spares impact far more than availability.  During
a rebuild performance is, in general, abysmal.  ...
  If I can't use the system due to performance being a fraction of what
it is during normal production, it might as well be an outage.



  I don't think I've seen such idea proposed for ZFS, and
  I do wonder if it is at all possible with variable-width
  stripes? Although if the disk is sliced in 200 metaslabs
  or so, implementing a spread-spare is a no-brainer as well.

Put some thoughts down on paper and work through the math. If it all
works
out, let's implement it!
  -- richard


I realize it's not intentional Richard, but that response is more than a
bit condescending.  If he could just put it down on paper and code
something up, I strongly doubt he would be posting his thoughts here.
  He would be posting results.  The intention of his post, as far as I
can tell, is to perhaps inspire someone who CAN just write down the math
and write up the code to do so.  Or at least to have them review his
thoughts and give him a dev's perspective on how viable bringing
something like this to ZFS is.  I fear responses like the code is
there, figure it out makes the *aris community no better than the linux
one.

 
  What do you think - can and should such ideas find their
  way into ZFS? Or why not? Perhaps from theoretical or
  real-life experience with such storage approaches?
 
  //Jim Klimov

As always, feel free to tell me why my rant is completely off base ;)

--Tim



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-08 Thread Pasi Kärkkäinen
On Sun, Jan 08, 2012 at 06:59:57AM +0400, Jim Klimov wrote:
 2012-01-08 5:37, Richard Elling ??:
 The big question is whether they are worth the effort. Spares solve a 
 serviceability
 problem and only impact availability in an indirect manner. For single-parity
 solutions, spares can make a big difference in MTTDL, but have almost no 
 impact
 on MTTDL for double-parity solutions (eg. raidz2).

 Well, regarding this part: in the presentation linked in my OP,
 the IBM presenter suggests that for a 6-disk raid10 (3 mirrors)
 with one spare drive, overall a 7-disk set, there are such
 options for critical hits to data redundancy when one of
 drives dies:

 1) Traditional RAID - one full disk is a mirror of another
full disk; 100% of a disk's size is critical and has to
be prelicated into a spare drive ASAP;

 2) Declustered RAID - all 7 disks are used for 2 unique data
blocks from original setup and one spare block (I am not
sure I described it well in words, his diagram shows it
better); if a single disk dies, only 1/7 worth of disk
size is critical (not redundant) and can be fixed faster.

For their typical 47-disk sets of RAID-7-like redundancy,
under 1% of data becomes critical when 3 disks die at once,
which is (deemed) unlikely as is.

 Apparently, in the GPFS layout, MTTDL is much higher than
 in raid10+spare with all other stats being similar.

 I am not sure I'm ready (or qualified) to sit down and present
 the math right now - I just heard some ideas that I considered
 worth sharing and discussing ;)


Thanks for the video link (http://www.youtube.com/watch?v=2g5rx4gP6yU). 
It's very interesting!

GPFS Native RAID seems to be more advanced than current ZFS,
and it even has rebalancing implemented (the infamous missing zfs bp-rewrite).

It'd definitely be interesting to have something like this implemented in ZFS.

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-08 Thread Richard Elling
Note: more analysis of the GPFS implementations is needed, but that will take 
more
time than I'll spend this evening :-) Quick hits below...

On Jan 7, 2012, at 7:15 PM, Tim Cook wrote:
 On Sat, Jan 7, 2012 at 7:37 PM, Richard Elling richard.ell...@gmail.com 
 wrote:
 Hi Jim,
 
 On Jan 6, 2012, at 3:33 PM, Jim Klimov wrote:
 
  Hello all,
 
  I have a new idea up for discussion.
 
  Several RAID systems have implemented spread spare drives
  in the sense that there is not an idling disk waiting to
  receive a burst of resilver data filling it up, but the
  capacity of the spare disk is spread among all drives in
  the array. As a result, the healthy array gets one more
  spindle and works a little faster, and rebuild times are
  often decreased since more spindles can participate in
  repairs at the same time.
 
 Xiotech has a distributed, relocatable model, but the FRU is the whole ISE.
 There have been other implementations of more distributed RAIDness in the
 past (RAID-1E, etc).
 
 The big question is whether they are worth the effort. Spares solve a 
 serviceability
 problem and only impact availability in an indirect manner. For single-parity
 solutions, spares can make a big difference in MTTDL, but have almost no 
 impact
 on MTTDL for double-parity solutions (eg. raidz2).
 
 
 I disagree.  Dedicated spares impact far more than availability.  During a 
 rebuild performance is, in general, abysmal.

In ZFS, there is a resilver throttle that is designed to ensure that 
resilvering activity
does not impact interactive performance. Do you have data that suggests 
otherwise?

  ZIL and L2ARC will obviously help (L2ARC more than ZIL),

ZIL makes zero impact on resilver.  I'll have to check to see if L2ARC is still 
used, but
due to the nature of the ARC design, read-once workloads like backup or 
resilver do 
not tend to negatively impact frequently used data.

 but at the end of the day, if we've got a 12 hour rebuild (fairly 
 conservative in the days of 2TB
 SATA drives), the performance degradation is going to be very real for 
 end-users.  

I'd like to see some data on this for modern ZFS implementations (post Summer 
2010)

 With distributed parity and spares, you should in theory be able to cut this 
 down an order of magnitude.  
 I feel as though you're brushing this off as not a big deal when it's an 
 EXTREMELY big deal (in my mind).  In my opinion you can't just approach this 
 from an MTTDL perspective, you also need to take into account user 
 experience.  Just because I haven't lost data, doesn't mean the system isn't 
 (essentially) unavailable (sorry for the double negative and repeated 
 parenthesis).  If I can't use the system due to performance being a fraction 
 of what it is during normal production, it might as well be an outage.

So we have a method to analyze the ability of a system to perform during 
degradation:
performability. This can be applied to computer systems and we've done some 
analysis
specifically on RAID arrays. See also
http://www.springerlink.com/content/267851748348k382/
http://blogs.oracle.com/relling/tags/performability

Hence my comment about doing some math :-)

  I don't think I've seen such idea proposed for ZFS, and
  I do wonder if it is at all possible with variable-width
  stripes? Although if the disk is sliced in 200 metaslabs
  or so, implementing a spread-spare is a no-brainer as well.
 
 Put some thoughts down on paper and work through the math. If it all works
 out, let's implement it!
  -- richard
 
 
 I realize it's not intentional Richard, but that response is more than a bit 
 condescending.  If he could just put it down on paper and code something up, 
 I strongly doubt he would be posting his thoughts here.  He would be posting 
 results.  The intention of his post, as far as I can tell, is to perhaps 
 inspire someone who CAN just write down the math and write up the code to do 
 so.  Or at least to have them review his thoughts and give him a dev's 
 perspective on how viable bringing something like this to ZFS is.  I fear 
 responses like the code is there, figure it out makes the *aris community 
 no better than the linux one.

When I talk about spares in tutorials, we discuss various tradeoffs and how to 
analyse
the systems. Interestingly, for the GPFS case, the mirrors example clearly 
shows the
benefit of declustered RAID. However, the triple-parity example (similar to 
raidz3) is
not so persuasive. If you have raidz3 + spares, then why not go ahead and do 
raidz4?
In the tutorial we work through a raidz2 + spare vs raidz2 case and the raidz2 
case
is better in both performance and dependability without sacrificing space (an 
unusual
condition!)

It is not very difficult to add a raidz4 or indeed any number of additional 
parity, but 
there is a point of diminishing returns, usually when some other system 
component
becomes more critical than the RAID protection. So, raidz4 + spare is less 
dependable
than raidz5, and so on.
 -- 

Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-08 Thread Jim Klimov

2012-01-09 6:25, Richard Elling wrote:

Note: more analysis of the GPFS implementations is needed, but that will take 
more
time than I'll spend this evening :-) Quick hits below...


Good to hear you might look into it after all ;)


but at the end of the day, if we've got a 12 hour rebuild (fairly conservative 
in the days of 2TB
SATA drives), the performance degradation is going to be very real for 
end-users.


I'd like to see some data on this for modern ZFS implementations (post Summer 
2010)



Is scrubbing performance irrelevant in this discussion?
I think that in general, scrubbing is the read-half of
a larger rebuild process, at least for a single-vdev pool,
so rebuilds are about as long or worse. Am I wrong?

In my home-NAS case a raidz2 pool of six 2Tb drives, which
is filled 76%, consistently takes 85 hours to scrub.
No SSDs involved, no L2ARC, no ZILs. According to iostat,
the HDDs are often utilized to 100% with random IO load,
yielding from 500KBps to 2-3MBps in about 80-100IOPS per
disk (I have a scrub going on at this moment).

This system variably runs oi_148a (LiveUSB recovery) and
oi_151a when alive ;)

HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-07 Thread Bob Friesenhahn

On Sat, 7 Jan 2012, Jim Klimov wrote:


Several RAID systems have implemented spread spare drives
in the sense that there is not an idling disk waiting to
receive a burst of resilver data filling it up, but the
capacity of the spare disk is spread among all drives in
the array. As a result, the healthy array gets one more
spindle and works a little faster, and rebuild times are
often decreased since more spindles can participate in
repairs at the same time.


I think that I would also be interested in a system which uses the 
so-called spare disks for more protective redundancy but then reduces 
that protective redundancy in order to use that disk to replace a 
failed disk or to automatically enlarge the pool.


For example, a pool could start out with four-way mirroring when there 
is little data in the pool.  When the pool becomes more full, mirror 
devices are automatically removed (from existing vdevs), and used to 
add more vdevs.  Eventually a limit would be hit so that no more 
mirrors are allowed to be removed.


Obviously this approach works with simple mirrors but not for raidz.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-07 Thread Richard Elling
Hi Jim,

On Jan 6, 2012, at 3:33 PM, Jim Klimov wrote:

 Hello all,
 
 I have a new idea up for discussion.
 
 Several RAID systems have implemented spread spare drives
 in the sense that there is not an idling disk waiting to
 receive a burst of resilver data filling it up, but the
 capacity of the spare disk is spread among all drives in
 the array. As a result, the healthy array gets one more
 spindle and works a little faster, and rebuild times are
 often decreased since more spindles can participate in
 repairs at the same time.

Xiotech has a distributed, relocatable model, but the FRU is the whole ISE.
There have been other implementations of more distributed RAIDness in the
past (RAID-1E, etc). 

The big question is whether they are worth the effort. Spares solve a 
serviceability
problem and only impact availability in an indirect manner. For single-parity 
solutions, spares can make a big difference in MTTDL, but have almost no impact
on MTTDL for double-parity solutions (eg. raidz2).

 I don't think I've seen such idea proposed for ZFS, and
 I do wonder if it is at all possible with variable-width
 stripes? Although if the disk is sliced in 200 metaslabs
 or so, implementing a spread-spare is a no-brainer as well.

Put some thoughts down on paper and work through the math. If it all works
out, let's implement it!
 -- richard

 
 To be honest, I've seen this a long time ago in (Falcon?)
 RAID controllers, and recently - in a USEnix presentation
 of IBM GPFS on YouTube. In the latter the speaker goes
 a greater depth describing how their declustered RAID
 approach (as they call it: all blocks - spare, redundancy
 and data are intermixed evenly on all drives and not in
 a single group or a mid-level VDEV as would be for ZFS).
 
 http://www.youtube.com/watch?v=2g5rx4gP6yUfeature=related
 
 GPFS with declustered RAID not only decreases rebuild
 times and/or impact of rebuilds on end-user operations,
 but it also happens to increase reliability - there is
 a smaller time window in case of multiple-disk failure
 in a large RAID-6 or RAID-7 array (in the example they
 use 47-disk sets) that the data is left in a critical
 state due to lack of redundancy, and there is less data
 overall in such state - so the system goes from critical
 to simply degraded (with some redundancy) in a few minutes.
 
 Another thing they have in GPFS is temporary offlining
 of disks so that they can catch up when reattached - only
 newer writes (bigger TXG numbers in ZFS terms) are added to
 reinserted disks. I am not sure this exists in ZFS today,
 either. This might simplify physical systems maintenance
 (as it does for IBM boxes - see presentation if interested)
 and quick recovery from temporarily unavailable disks, such
 as when a disk gets a bus reset and is unavailable for writes
 for a few seconds (or more) while the array keeps on writing.
 
 I find these ideas cool. I do believe that IBM might get
 angry if ZFS development copy-pasted them as is, but it
 might get nonetheless get us inventing a similar wheel
 that would be a bit different ;)
 There are already several vendors doing this in some way,
 so perhaps there is no (patent) monopoly in place already...
 
 And I think all the magic of spread spares and/or declustered
 RAID would go into just making another write-block allocator
 in the same league raidz or mirror are nowadays...
 BTW, are such allocators pluggable (as software modules)?
 
 What do you think - can and should such ideas find their
 way into ZFS? Or why not? Perhaps from theoretical or
 real-life experience with such storage approaches?
 
 //Jim Klimov
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 

ZFS and performance consulting
http://www.RichardElling.com
illumos meetup, Jan 10, 2012, Menlo Park, CA
http://www.meetup.com/illumos-User-Group/events/41665962/ 














___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-07 Thread Jim Klimov

2012-01-08 5:37, Richard Elling пишет:

The big question is whether they are worth the effort. Spares solve a 
serviceability
problem and only impact availability in an indirect manner. For single-parity
solutions, spares can make a big difference in MTTDL, but have almost no impact
on MTTDL for double-parity solutions (eg. raidz2).


Well, regarding this part: in the presentation linked in my OP,
the IBM presenter suggests that for a 6-disk raid10 (3 mirrors)
with one spare drive, overall a 7-disk set, there are such
options for critical hits to data redundancy when one of
drives dies:

1) Traditional RAID - one full disk is a mirror of another
   full disk; 100% of a disk's size is critical and has to
   be prelicated into a spare drive ASAP;

2) Declustered RAID - all 7 disks are used for 2 unique data
   blocks from original setup and one spare block (I am not
   sure I described it well in words, his diagram shows it
   better); if a single disk dies, only 1/7 worth of disk
   size is critical (not redundant) and can be fixed faster.

   For their typical 47-disk sets of RAID-7-like redundancy,
   under 1% of data becomes critical when 3 disks die at once,
   which is (deemed) unlikely as is.

Apparently, in the GPFS layout, MTTDL is much higher than
in raid10+spare with all other stats being similar.

I am not sure I'm ready (or qualified) to sit down and present
the math right now - I just heard some ideas that I considered
worth sharing and discussing ;)

Thanks for the input,
//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-07 Thread Tim Cook
On Sat, Jan 7, 2012 at 7:37 PM, Richard Elling richard.ell...@gmail.comwrote:

 Hi Jim,

 On Jan 6, 2012, at 3:33 PM, Jim Klimov wrote:

  Hello all,
 
  I have a new idea up for discussion.
 
  Several RAID systems have implemented spread spare drives
  in the sense that there is not an idling disk waiting to
  receive a burst of resilver data filling it up, but the
  capacity of the spare disk is spread among all drives in
  the array. As a result, the healthy array gets one more
  spindle and works a little faster, and rebuild times are
  often decreased since more spindles can participate in
  repairs at the same time.

 Xiotech has a distributed, relocatable model, but the FRU is the whole ISE.
 There have been other implementations of more distributed RAIDness in the
 past (RAID-1E, etc).

 The big question is whether they are worth the effort. Spares solve a
 serviceability
 problem and only impact availability in an indirect manner. For
 single-parity
 solutions, spares can make a big difference in MTTDL, but have almost no
 impact
 on MTTDL for double-parity solutions (eg. raidz2).



I disagree.  Dedicated spares impact far more than availability.  During a
rebuild performance is, in general, abysmal.  ZIL and L2ARC will obviously
help (L2ARC more than ZIL), but at the end of the day, if we've got a 12
hour rebuild (fairly conservative in the days of 2TB SATA drives), the
performance degradation is going to be very real for end-users.  With
distributed parity and spares, you should in theory be able to cut this
down an order of magnitude.  I feel as though you're brushing this off as
not a big deal when it's an EXTREMELY big deal (in my mind).  In my opinion
you can't just approach this from an MTTDL perspective, you also need to
take into account user experience.  Just because I haven't lost data,
doesn't mean the system isn't (essentially) unavailable (sorry for the
double negative and repeated parenthesis).  If I can't use the system due
to performance being a fraction of what it is during normal production, it
might as well be an outage.





  I don't think I've seen such idea proposed for ZFS, and
  I do wonder if it is at all possible with variable-width
  stripes? Although if the disk is sliced in 200 metaslabs
  or so, implementing a spread-spare is a no-brainer as well.

 Put some thoughts down on paper and work through the math. If it all works
 out, let's implement it!
  -- richard


I realize it's not intentional Richard, but that response is more than a
bit condescending.  If he could just put it down on paper and code
something up, I strongly doubt he would be posting his thoughts here.  He
would be posting results.  The intention of his post, as far as I can tell,
is to perhaps inspire someone who CAN just write down the math and write up
the code to do so.  Or at least to have them review his thoughts and give
him a dev's perspective on how viable bringing something like this to ZFS
is.  I fear responses like the code is there, figure it out makes the
*aris community no better than the linux one.




 
  To be honest, I've seen this a long time ago in (Falcon?)
  RAID controllers, and recently - in a USEnix presentation
  of IBM GPFS on YouTube. In the latter the speaker goes
  a greater depth describing how their declustered RAID
  approach (as they call it: all blocks - spare, redundancy
  and data are intermixed evenly on all drives and not in
  a single group or a mid-level VDEV as would be for ZFS).
 
  http://www.youtube.com/watch?v=2g5rx4gP6yUfeature=related
 
  GPFS with declustered RAID not only decreases rebuild
  times and/or impact of rebuilds on end-user operations,
  but it also happens to increase reliability - there is
  a smaller time window in case of multiple-disk failure
  in a large RAID-6 or RAID-7 array (in the example they
  use 47-disk sets) that the data is left in a critical
  state due to lack of redundancy, and there is less data
  overall in such state - so the system goes from critical
  to simply degraded (with some redundancy) in a few minutes.
 
  Another thing they have in GPFS is temporary offlining
  of disks so that they can catch up when reattached - only
  newer writes (bigger TXG numbers in ZFS terms) are added to
  reinserted disks. I am not sure this exists in ZFS today,
  either. This might simplify physical systems maintenance
  (as it does for IBM boxes - see presentation if interested)
  and quick recovery from temporarily unavailable disks, such
  as when a disk gets a bus reset and is unavailable for writes
  for a few seconds (or more) while the array keeps on writing.
 
  I find these ideas cool. I do believe that IBM might get
  angry if ZFS development copy-pasted them as is, but it
  might get nonetheless get us inventing a similar wheel
  that would be a bit different ;)
  There are already several vendors doing this in some way,
  so perhaps there is no (patent) monopoly in place already...
 
  And I think all the magic of