A few things to think about when reading that forum post:

- The scenario described in that post is based on the assumption that all 
blocks read from disk somehow get funneled through a single memory location, 
which also happens to have a permanent fault.
- In addition, it assumes that after a checksum failure, the corrected data 
either gets stored in the exact same memory location again, or in another 
memory location that also has a permanent fault.
- It also completely ignores the fact that ZFS has an internal error threshold 
and will automatically offline a device once the number of read/checksum errors 
seen on it exceeds that threshold, preventing further corruption. ZFS will 
*not* go and happily mess up your entire pool.
- This would *not* be silent; ZFS would report a large number of checksum 
errors on all your devices.
- Blocks corrupted in that particular way would *not* actually spread to 
incremental backups or via rsync, as the corrupted blocks would not be seen as 
modified.
- There is no indication that the reported cases of data loss that he points to 
are actually due to the particular failure mechanism described in the post; 
there are *lots* of other ways in which memory corruption can lead to a file 
system becoming unmountable, checksums or not.
- Last but not leasts, note that “Cyberjock" is a community moderator, not 
somebody who’s actually in any way involved in the development of ZFS (or even 
FreeNAS; see the preface of his FreeNAS guide for some info on his background). 
If this were really as big of a risk as he thinks it is, you’d think somebody 
who is actually familiar with the internals of ZFS would have raised this 
concern before.



On Feb 26, 2014, at 5:56 PM, Philip Robar <philip.ro...@gmail.com> wrote:

> Please note, I'm not trolling with this message. I worked in Sun's OS/Net 
> group and am a huge fan of ZFS.
> 
> The leading members of the FreeNAS community make it clear [1] (with a 
> detailed explanation and links to reports of data loss) that if you use ZFS 
> without ECC RAM that there is a very good chance that you will eventually 
> experience a total loss of your data without any hope of recovery. [2] 
> (Unless you have literally thousands of dollars to spend on that recovery. 
> And even then there's no guarantee of said recovery.) The features of ZFS, 
> checksumming and scrubbing, work together to silently spread the damage done 
> by cosmic rays and/or bad memory throughout a file system and this corruption 
> then spreads to your backups.
> 
> Given this, aren't the various ZFS communities--particularly those that are 
> small machine oriented [3]--other than FreeNAS (and even they don't say it as 
> strongly enough in their docs), doing users a great disservice by implicitly 
> encouraging them to use ZFS w/o ECC RAM or on machines that can't use ECC RAM?
> 
> As an indication of how persuaded I've been for the need of ECC RAM, I've 
> shut down my personal server and am not going to access that data until I've 
> built a new machine with ECC RAM.
> 
> Phil
> 
> [1] ECC vs non-ECC RAM and ZFS: 
> http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/
> 
> [2] cyberjock: "So when you read about how using ZFS is an "all or none" I'm 
> not just making this up. I'm really serious as it really does work that way. 
> ZFS either works great or doesn't work at all. That really truthfully [is] 
> how it works."
> 
> [3] ZFS-macos, NAS4Free, PC-BSD, ZFS on Linux
> 
> 
> -- 
>  
> --- 
> You received this message because you are subscribed to the Google Groups 
> "zfs-macos" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to zfs-macos+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to