Re: [zfs-discuss] triple-parity: RAID-Z3

2009-08-04 Thread Martin
 With RAID-Z stripes can be of variable width meaning that, say, a
 single row
 in a 4+2 configuration might have two stripes of 1+2. In other words,
 there
 might not be enough space in the new parity device.

Wow -- I totally missed that scenario.  Excellent point.

  I did write up the
 steps
 that would be needed to support RAID-Z expansion

Good write up.  If I understand it, the basic approach is to add the device to 
each row and leave the unusable fragments there.  New stripes will take 
advantage of the wider row but old stripes will not.

It would seem that the mythical bp_rewrite() that I see mentioned here and 
there could relocate a stripe to another set of rows without altering the 
transaction_id (or whatever it's called), critical for tracking snapshots.  I 
suspect this function would allow background defrag/coalesce (a needed feature 
IMHO) and deduplication.  With background defrag, the extra space on existing 
stripes would not immediately be usable, but would appear over time.

Many thanks for the insight and thoughts.

Bluntly, how can I help?  I have cut a lifetime of C code in a past life.

Cheers,
Marty
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-24 Thread Ross
Interesting, so the more drive failures you have, the slower the array gets?

Would I be right in assuming that the slowdown is only up to the point where 
FMA / ZFS marks the drive as faulted?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-23 Thread Victor Latushkin

On 22.07.09 10:45, Adam Leventhal wrote:

which gap?

'RAID-Z should mind the gap on writes' ?

Message was edited by: thometal


I believe this is in reference to the raid 5 write hole, described here:
http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5_performance


It's not.

So I'm not sure what the 'RAID-Z should mind the gap on writes' 
comment is getting at either.


Clarification?



I'm planning to write a blog post describing this, but the basic problem 
is that RAID-Z, by virtue of supporting variable stripe writes (the 
insight that allows us to avoid the RAID-5 write hole), must round the 
number of sectors up to a multiple of nparity+1. This means that we may 
have sectors that are effectively skipped. ZFS generally lays down data 
in large contiguous streams, but these skipped sectors can stymie both 
ZFS's write aggregation as well as the hard drive's ability to group 
I/Os and write them quickly.


Jeff Bonwick added some code to mind these gaps on reads. The key 
insight there is that if we're going to read 64K, say, with a 512 byte 
hole in the middle, we might as well do one big read rather than two 
smaller reads and just throw out the data that we don't care about.


Of course, doing this for writes is a bit trickier since we can't just 
blithely write over gaps as those might contain live data on the disk. 
To solve this we push the knowledge of those skipped sectors down to the 
I/O aggregation layer in the form of 'optional' I/Os purely for the 
purpose of coalescing writes into larger chunks.


This exact issue was discussed here almost three years ago:

http://www.opensolaris.org/jive/thread.jspa?messageID=60241


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-23 Thread Robert Milkowski

Adam Leventhal wrote:

Hey Bob,

MTTDL analysis shows that given normal evironmental conditions, the 
MTTDL of RAID-Z2 is already much longer than the life of the computer 
or the attendant human.  Of course sometimes one encounters unusual 
conditions where additional redundancy is desired.


To what analysis are you referring? Today the absolute fastest you can 
resilver a 1TB drive is about 4 hours. Real-world speeds might be half 
that. In 2010 we'll have 3TB drives meaning it may take a full day to 
resilver. The odds of hitting a latent bit error are already 
reasonably high especially with a large pool that's infrequently 
scrubbed meaning. What then are the odds of a second drive failing in 
the 24 hours it takes to resiler?




I wish it was so good with raid-zN.
In real life, at least from mine experience, it can take several days to 
resilver a disk for vdevs in raid-z2 made of 11x sata disk drives with 
real data.
While the way zfs ynchronizes data is way faster under some 
circumstances it is also much slower under other.
IIRC some builds ago there were some fixes integrated so maybe it is 
different now.



I do think that it is worthwhile to be able to add another parity 
disk to an existing raidz vdev but I don't know how much work that 
entails.


It entails a bunch of work:

  http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z

Matt Ahrens is working on a key component after which it should all be 
possible.



A lot of people are waiting for it! :) :) :)


ps. thank you for raid-z3!

--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-23 Thread Adam Leventhal
Robert,

On Fri, Jul 24, 2009 at 12:59:01AM +0100, Robert Milkowski wrote:
 To what analysis are you referring? Today the absolute fastest you can 
 resilver a 1TB drive is about 4 hours. Real-world speeds might be half 
 that. In 2010 we'll have 3TB drives meaning it may take a full day to 
 resilver. The odds of hitting a latent bit error are already reasonably 
 high especially with a large pool that's infrequently scrubbed meaning. 
 What then are the odds of a second drive failing in the 24 hours it takes 
 to resiler?

 I wish it was so good with raid-zN.
 In real life, at least from mine experience, it can take several days to 
 resilver a disk for vdevs in raid-z2 made of 11x sata disk drives with real 
 data.
 While the way zfs ynchronizes data is way faster under some circumstances 
 it is also much slower under other.
 IIRC some builds ago there were some fixes integrated so maybe it is 
 different now.

Absolutely. I was talking more or less about optimal timing. I realize that
due to the priorities within ZFS and real word loads that it can take far
longer.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-23 Thread Robert Milkowski

Adam Leventhal wrote:


I just blogged about triple-parity RAID-Z (raidz3):

  http://blogs.sun.com/ahl/entry/triple_parity_raid_z

As for performance, on the system I was using (a max config Sun Storage
7410), I saw about a 25% improvement to 1GB/s for a streaming write
workload. YMMV, but I'd be interested in hearing your results.


25% improvement when comparing what exactly to what?


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-22 Thread Adam Leventhal

Hey Bob,

MTTDL analysis shows that given normal evironmental conditions, the  
MTTDL of RAID-Z2 is already much longer than the life of the  
computer or the attendant human.  Of course sometimes one encounters  
unusual conditions where additional redundancy is desired.


To what analysis are you referring? Today the absolute fastest you can  
resilver a 1TB drive is about 4 hours. Real-world speeds might be half  
that. In 2010 we'll have 3TB drives meaning it may take a full day to  
resilver. The odds of hitting a latent bit error are already  
reasonably high especially with a large pool that's infrequently  
scrubbed meaning. What then are the odds of a second drive failing in  
the 24 hours it takes to resiler?


I do think that it is worthwhile to be able to add another parity  
disk to an existing raidz vdev but I don't know how much work that  
entails.


It entails a bunch of work:

  http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z

Matt Ahrens is working on a key component after which it should all be  
possible.


Zfs development seems to be overwelmed with marketing-driven  
requirements lately and it is time to get back to brass tacks and  
make sure that the parts already developed are truely enterprise- 
grade.



While I don't disagree that the focus for ZFS should be ensuring  
enterprise-class reliability and performance, let me assure you that  
requirements are driven by the market and not by marketing.


Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-22 Thread Adam Leventhal

which gap?

'RAID-Z should mind the gap on writes' ?

Message was edited by: thometal


I believe this is in reference to the raid 5 write hole, described  
here:

http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5_performance


It's not.

So I'm not sure what the 'RAID-Z should mind the gap on writes'  
comment is getting at either.


Clarification?



I'm planning to write a blog post describing this, but the basic  
problem is that RAID-Z, by virtue of supporting variable stripe writes  
(the insight that allows us to avoid the RAID-5 write hole), must  
round the number of sectors up to a multiple of nparity+1. This means  
that we may have sectors that are effectively skipped. ZFS generally  
lays down data in large contiguous streams, but these skipped sectors  
can stymie both ZFS's write aggregation as well as the hard drive's  
ability to group I/Os and write them quickly.


Jeff Bonwick added some code to mind these gaps on reads. The key  
insight there is that if we're going to read 64K, say, with a 512 byte  
hole in the middle, we might as well do one big read rather than two  
smaller reads and just throw out the data that we don't care about.


Of course, doing this for writes is a bit trickier since we can't just  
blithely write over gaps as those might contain live data on the disk.  
To solve this we push the knowledge of those skipped sectors down to  
the I/O aggregation layer in the form of 'optional' I/Os purely for  
the purpose of coalescing writes into larger chunks.


I hope that's clear; if it's not, stay tuned for the aforementioned  
blog post.


Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-22 Thread Adam Leventhal

Don't hear about triple-parity RAID that often:


Author: Adam Leventhal
Repository: /hg/onnv/onnv-gate
Latest revision: 17811c723fb4f9fce50616cb740a92c8f6f97651
Total changesets: 1
Log message:
6854612 triple-parity RAID-Z


http://mail.opensolaris.org/pipermail/onnv-notify/2009-July/ 
009872.html

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6854612

(Via Blog O' Matty.)

Would be curious to see performance characteristics.



I just blogged about triple-parity RAID-Z (raidz3):

  http://blogs.sun.com/ahl/entry/triple_parity_raid_z

As for performance, on the system I was using (a max config Sun Storage
7410), I saw about a 25% improvement to 1GB/s for a streaming write
workload. YMMV, but I'd be interested in hearing your results.

Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-22 Thread Adam Leventhal

Don't hear about triple-parity RAID that often:


I agree completely.  In fact, I have wondered (probably in these  
forums), why we don't bite the bullet and make a generic raidzN,  
where N is any number =0.


I agree, but raidzN isn't simple to implement and it's potentially  
difficult
to get it to perform well. That said, it's something I intend to bring  
to

ZFS in the next year or so.

If memory serves, the second parity is calculated using Reed-Solomon  
which implies that any number of parity devices is possible.


True; it's a degenerate case.

In fact, get rid of mirroring, because it clearly is a variant of  
raidz with two devices.  Want three way mirroring?  Call that raidz2  
with three devices.  The truth is that a generic raidzN would roll  
up everything: striping, mirroring, parity raid, double parity, etc.  
into a single format with one parameter.


That's an interesting thought, but there are some advantages to  
calling out mirroring for example as its own vdev type. As has been  
pointed out, reading from either side of the mirror involves no  
computation whereas reading from a RAID-Z 1+2 for example would  
involve more computation. This would

complicate the calculus of balancing read operations over the mirror
devices.

Let's not stop there, though.  Once we have any number of parity  
devices, why can't I add a parity device to an array?  That should  
be simple enough with a scrub to set the parity.  In fact, what is  
to stop me from removing a parity device?  Once again, I think the  
code would make this rather easy.


With RAID-Z stripes can be of variable width meaning that, say, a  
single row
in a 4+2 configuration might have two stripes of 1+2. In other words,  
there
might not be enough space in the new parity device. I did write up the  
steps

that would be needed to support RAID-Z expansion; you can find it here:

  http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z

Ok, back to the real world.  The one downside to triple parity is  
that I recall the code discovered the corrupt block by excluding it  
from the stripe, reconstructing the stripe and comparing that with  
the checksum.  In other words, for a given cost of X to compute a  
stripe and a number P of corrupt blocks, the cost of reading a  
stripe is approximately X^P.  More corrupt blocks would radically  
slow down the system.  With raidz2, the maximum number of corrupt  
blocks would be two, putting a cap on how costly the read can be.


Computing the additional parity of triple-parity RAID-Z is slightly  
more expensive, but not much -- it's just bitwise operations.  
Recovering from
a read failure is identical (and performs identically) to raidz1 or  
raidz2
until you actually have sustained three failures. In that case,  
performance
is slower as more computation is involved -- but aren't you just happy  
to

get your data back?

If there is silent data corruption, then and only then can you encounter
the O(n^3) algorithm that you alluded to, but only as a last resort.  
If we
don't know what drives failed, we try to reconstruct your data by  
assuming
that one drive, then two drives, then three drives are returning bad  
data.
For raidz1, this was a linear operation; raidz2, quadratic; now raidz3  
is
N-cubed. There's really no way around it. Fortunately with proper  
scrubbing

encountering data corruption in one stripe on three different drives is
highly unlikely.

Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-20 Thread Thomas
which gap?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-20 Thread Scott Meilicke
 which gap?
 
 'RAID-Z should mind the gap on writes' ?
 
 Message was edited by: thometal

I believe this is in reference to the raid 5 write hole, described here:
http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5_performance

RAIDZ should avoid this via it's Copy on Write model:
http://en.wikipedia.org/wiki/Zfs#Copy-on-write_transactional_model

So I'm not sure what the 'RAID-Z should mind the gap on writes' comment is 
getting at either. 

Clarification?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-20 Thread Thomas
http://mail.opensolaris.org/pipermail/onnv-notify/2009-July/009872.html

second bug, its the same link like in the first post.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-20 Thread chris
That would be nice. Before developers worry about such exotic 
features, I would rather that they attend to the gross performance 
issues so that zfs performs at least as well as Windows NTFS or Linux 
XFS in all common cases.

To each their own. 
A FS that calculates and writes parity onto disks will have difficulties being 
as fast as a FS that just dumps data. 
A FS that verifies read data parity will have difficulties being as fast as a 
FS that just returns whatever it reads. 
I can not see how this can happen. That's no reason not to aim for a low 
overhead, but one has to make choices here. Mine is data safety and ease of 
use, so I'd love the elastic zpool idea. Of course, others will have 
different needs. Enterprises will not care about ease so much as they have 
dedicated professionals to pamper their arrays. They can also address speed 
issues with more spindles. ZFS+RAIDZ provides data integrity no RAID level can 
match thanks its checksumming. That's worth a speed sacrifice in my book.
Anything I missed?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-20 Thread Bob Friesenhahn

On Mon, 20 Jul 2009, chris wrote:


That would be nice. Before developers worry about such exotic
features, I would rather that they attend to the gross performance
issues so that zfs performs at least as well as Windows NTFS or Linux
XFS in all common cases.


To each their own.


I was referring to gripes about performance in another discussion 
thread, and not due to RAID-Z3.  I don't think that adding another 
parity disk will make much difference to performance.  Adding another 
parity disk has a similar performance impact as making the stripe one 
disk wider.


MTTDL analysis shows that given normal evironmental conditions, the 
MTTDL of RAID-Z2 is already much longer than the life of the computer 
or the attendant human.  Of course sometimes one encounters unusual 
conditions where additional redundancy is desired.


I do think that it is worthwhile to be able to add another parity disk 
to an existing raidz vdev but I don't know how much work that entails.


Zfs development seems to be overwelmed with marketing-driven 
requirements lately and it is time to get back to brass tacks and make 
sure that the parts already developed are truely enterprise-grade.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-19 Thread Bob Friesenhahn

On Sat, 18 Jul 2009, Martin wrote:

In fact, get rid of mirroring, because it clearly is a variant of 
raidz with two devices.  Want three way mirroring?  Call that raidz2


I don't see much similarity between mirroring and raidz other than 
that they both support redundancy.


Let's not stop there, though.  Once we have any number of parity 
devices, why can't I add a parity device to an array?  That should 
be simple enough with a scrub to set the parity.  In fact, what is 
to stop me from removing a parity device?  Once again, I think the 
code would make this rather easy.


A RAID system with distributed parity (like raidz) does not have a 
parity device.  Instead, all disks are treated as equal.  Without 
distributed parity you have a bottleneck and it becomes difficult to 
scale the array to different stripe sizes.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-19 Thread Martin
 I don't see much similarity between mirroring and raidz other than
 that they both support redundancy.

A single parity device against a single data device is, in essence, mirroring.  
For all intents and purposes, raid and mirroring with this configuration are 
one and the same.

 A RAID system with distributed parity (like raidz) does not have a
 parity device. Instead, all disks are treated as equal. Without
 distributed parity you have a bottleneck and it becomes difficult to
 scale the array to different stripe sizes.

Agreed.  Distributed parity is the way to go.  Nonetheless, if I have an array 
with a single parity, then I still have one device dedicated to parity, even if 
the actual device which holds the parity information will vary from stripe to 
stripe.

The point simply was that it might be straightforward to add a device and 
convert a raidz array into a raidz2 array, which effectively would be adding a 
parity device.  An extension of that is to convert a raidz2 array back into a 
raidz array and increase its size without adding a device.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-19 Thread Bob Friesenhahn

On Sun, 19 Jul 2009, Martin wrote:


I don't see much similarity between mirroring and raidz other than
that they both support redundancy.


A single parity device against a single data device is, in essence, 
mirroring.  For all intents and purposes, raid and mirroring with 
this configuration are one and the same.


Try creating a raidz pool with two drives (or files), pull one of the 
drives, and see what happens.  Then try the same with mirroring.  Do 
they behave the same?  I expect not ...


I am not sure if raidz even allows you to create  pool with just two 
drives.


The point simply was that it might be straightforward to add a 
device and convert a raidz array into a raidz2 array, which 
effectively would be adding a parity device.  An extension of that 
is to convert a raidz2 array back into a raidz array and increase 
its size without adding a device.


That would be nice.  Before developers worry about such exotic 
features, I would rather that they attend to the gross performance 
issues so that zfs performs at least as well as Windows NTFS or Linux 
XFS in all common cases.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-19 Thread Craig Cory
In response to:
 I don't see much similarity between mirroring and raidz other than
 that they both support redundancy.

Martin wrote:
 A single parity device against a single data device is, in essence, mirroring.
  For all intents and purposes, raid and mirroring with this configuration are
 one and the same.

I would have to disagree with this. Mirrored data will have mulitple copies of
the actual data. Any copy is a valid source for data access. Lose one disk and
the other is a complete original. A raid 3/4/5/6/z/z2 configuration will
generate a mathematical value to restore a portion of the lost data one of the
storage units in the stripe. A 2-disk raidz will have 1/2 of each disk's used
space holding primary data interlaced with the other 1/2 holding a parity
reflection of the data. Any time we access the parity representation, some
computation will be needed to render the live data. This would have to add
*some* overhead to the io.

Craig Cory
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-18 Thread Martin
 Don't hear about triple-parity RAID that often:

I agree completely.  In fact, I have wondered (probably in these forums), why 
we don't bite the bullet and make a generic raidzN, where N is any number =0.

In fact, get rid of mirroring, because it clearly is a variant of raidz with 
two devices.  Want three way mirroring?  Call that raidz2 with three devices.  
The truth is that a generic raidzN would roll up everything: striping, 
mirroring, parity raid, double parity, etc. into a single format with one 
parameter.

If memory serves, the second parity is calculated using Reed-Solomon which 
implies that any number of parity devices is possible.

Let's not stop there, though.  Once we have any number of parity devices, why 
can't I add a parity device to an array?  That should be simple enough with a 
scrub to set the parity.  In fact, what is to stop me from removing a parity 
device?  Once again, I think the code would make this rather easy.

Once we can add and remove parity devices at will, it might not be a stretch to 
convert a parity device to data and vice versa.  If you have four data drives 
and two parity drives but need more space, in a pinch just convert one parity 
drive to data and get more storage.

The flip side would work as well.  If I have six data drives and a single 
parity drive but have, over the years, replaced them all with vastly larger 
drives and have space to burn, I might want to covert a data drive to parity.  
I may sleep better at night.

If we had a generic raidzN, the ability to add/remove parity devices and the 
ability to convert a data device from/to a parity device, then what happens?  
Total freedom.  Add devices to the array, or take them away.  Choose the blend 
of performance and redundancy that meets YOUR needs, then change it later when 
the technology and business needs change, all without interruption.

Ok, back to the real world.  The one downside to triple parity is that I recall 
the code discovered the corrupt block by excluding it from the stripe, 
reconstructing the stripe and comparing that with the checksum.  In other 
words, for a given cost of X to compute a stripe and a number P of corrupt 
blocks, the cost of reading a stripe is approximately X^P.  More corrupt blocks 
would radically slow down the system.  With raidz2, the maximum number of 
corrupt blocks would be two, putting a cap on how costly the read can be.

Standard disclaimers apply: I could be wrong, I am often wrong, etc.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss