Excellent post, many thanks for the links, especially about RAIDZ and the 
MTTDL metric problem. 

Now back to the question about RAIDZ and/or copies=X:
Both protect against data corruption on disk. RAIDZ does it with parity 
information on the whole disk level, copies=X does it via internal file 
If the goal is to protect against whole disk failures in a multi-disk 
setting, I would assume RAIDZ is more natural. 
It doesn't discriminate though about which files it protects - everything 
gets stored with additional parity information.

The interesting bit that I read in the 'ZFS, Copies, and Data 
article is that, in contrast to traditional raid, it seems not to give 
improved performance due to stripping. At least not in terms of disk 
operation rate. It rather 'just' improves fail safety. The article leaves 
out though whether the disk operations actually deal with more data in the 
RAIDZ case as one logical RAIDZ I/O still affects N-1 disk blocks, so I 
would assume data throughput per file access still increases with the 
number of disks N (for N>2). 
Is my reasoning here correct?

The 'copies=X' parameter of zfs file systems seems to target settings with 
just a single disk, say a ZFS-formatted partition on a laptop drive, where 
raid is not applicable. This is fine and makes IMHO a lot of sense. But I 
do not see a point for combining both together as the parity information of 
RAIDZ would already protect against data corruption and even disk loss. 

One interesting question is whether copies=X (X>=2) alone could do the same 
than RAIDZ on a purely stripped disk pool.
I read in Oracle's zfs documentation that copies=X tries to store copies on 
different disks - but wouldn't a stripped disk pool use all disks anyway?
Or am I incorrectly mixing here my understanding of traditional raid0 
settings with the mechanics of zfs?

Some background information perhaps why I am asking all this:
I am playing with the idea to format a 4-disk 'JBOD" enclosure using zfs 
with a RAIDZ or even RAIDZ2 setting to protect against disk failures.
In my understanding this also should protect against single file corruption 
and the ominous 'bit rot' - especially with RAIDZ2.
I would loose one or two disk capacity though from the beginning, which I 
would be fine with. 
If I could gain some space again using a different tactic, I am fine too 
though as the enclosure has 4 bays only.
I am also not sure now whether the performance is still higher due to 
parallel I/O (see comment above about constant number of disk ops per 
At least it should be so high to saturate a gigabit ethernet link (i.e. 100 
- 110 MB/s).

On Monday, 17 March 2014 22:23:18 UTC+11, Philip Robar wrote:

> On Mon, Mar 17, 2014 at 3:35 AM, Dave Cottlehuber 
> <d...@jsonified.com<javascript:>
> > wrote:
>> On 17. März 2014 at 05:00:25, roemer (uwe....@gmail.com <javascript:>) 
>> wrote:
>> > > How does a 'copies=2' filesystem play together with a 'RAIDZ1' (or 
>> even
>> > > RAIDZ2) pool? RAIDZ would have all data stored redundantly already, so
> > > would 'copies=2' not end up in quadrupling the storage requirement if 
>> used
>> > > on a raidz pool?
>> Yes
> So the amount of space lost to parity is a constant of disk size x RAID 
> level. Thus, if you're using copies, the amount of space lost is just 
> dataset size / copies. One of the nice things about using copies as opposed 
> to mirroring is that you can set it on a per file system (e.g. dataset) as 
> opposed to mirroring which affects the entire vdev.

> On the other hand, if you're using mirroring, then yes turning on copies=2 
> does cut your storage space to pool size / 4. (Assuming all datasets in the 
> pool have this set.)
> RAIDZ vs mirroring vs copies all comes down to trading off performance vs 
> Reliability, Availability and Serviceability vs space. There are formulas 
> for figuring all of this out. Start at Serve the Home's Raid Reliablitity 
> calculator<http://www.servethehome.com/raid-calculator/raid-reliability-calculator-simple-mttdl-model/>*
>  which 
> takes into account everything, but increasing file redundancy. For that 
> there's this article: ZFS, Copies, and Data 
> Protection<https://www.google.com/url?q=https%3A%2F%2Fblogs.oracle.com%2Frelling%2Fentry%2Fzfs_copies_and_data_protection&sa=D&sntz=1&usg=AFQjCNFT20gar8xxawKzR_SkJvPpMFaieg>.
> And for RAIDZ vs Mirroring performance see When To (And Not To) Use 
> RAID-Z<https://blogs.oracle.com/roch/entry/when_to_and_not_to>
> .
> Phil
> * Note that the Mean Time to Data Loss calculated at this site, while 
> being an industry standard, is essentially useless other than for getting a 
> relative comparison of different configurations. For details see: Mean 
> time to meaningless: MTTDL, Markov models, and storage system 
> reliability<https://www.usenix.org/legacy/event/hotstorage10/tech/full_papers/Greenan.pdf>
> .


You received this message because you are subscribed to the Google Groups 
"zfs-macos" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to