Actually, thinking about this some more, the real reason that this hypothetical horror scenario cannot actually happen in real life is that the checksum would never get recomputed from the improperly “corrected” data to begin with: The checksum for a given block is stored in its *parent* block (which itself has a checksum that is stored in its parent, and so on and so forth, all the way up to the uberblock), not in the block itself. Therefore, if a checksum failure is detected for a block, only the block itself will be corrected (and possibly corrupted as a result of a memory error), not its checksum (which is protected by the parent block’s checksum).
See e.g. the following website for more explanation of how things are organized internally: <http://www.nexenta.com/corp/zfs-education/207-nexentastor-zfs-copy-on-write-checksums-and-consistency> On Feb 26, 2014, at 7:09 PM, Daniel Becker <razzf...@gmail.com> wrote: > A few things to think about when reading that forum post: > > - The scenario described in that post is based on the assumption that all > blocks read from disk somehow get funneled through a single memory location, > which also happens to have a permanent fault. > - In addition, it assumes that after a checksum failure, the corrected data > either gets stored in the exact same memory location again, or in another > memory location that also has a permanent fault. > - It also completely ignores the fact that ZFS has an internal error > threshold and will automatically offline a device once the number of > read/checksum errors seen on it exceeds that threshold, preventing further > corruption. ZFS will *not* go and happily mess up your entire pool. > - This would *not* be silent; ZFS would report a large number of checksum > errors on all your devices. > - Blocks corrupted in that particular way would *not* actually spread to > incremental backups or via rsync, as the corrupted blocks would not be seen > as modified. > - There is no indication that the reported cases of data loss that he points > to are actually due to the particular failure mechanism described in the > post; there are *lots* of other ways in which memory corruption can lead to a > file system becoming unmountable, checksums or not. > - Last but not leasts, note that “Cyberjock" is a community moderator, not > somebody who’s actually in any way involved in the development of ZFS (or > even FreeNAS; see the preface of his FreeNAS guide for some info on his > background). If this were really as big of a risk as he thinks it is, you’d > think somebody who is actually familiar with the internals of ZFS would have > raised this concern before. > > > > On Feb 26, 2014, at 5:56 PM, Philip Robar <philip.ro...@gmail.com> wrote: > >> Please note, I'm not trolling with this message. I worked in Sun's OS/Net >> group and am a huge fan of ZFS. >> >> The leading members of the FreeNAS community make it clear  (with a >> detailed explanation and links to reports of data loss) that if you use ZFS >> without ECC RAM that there is a very good chance that you will eventually >> experience a total loss of your data without any hope of recovery.  >> (Unless you have literally thousands of dollars to spend on that recovery. >> And even then there's no guarantee of said recovery.) The features of ZFS, >> checksumming and scrubbing, work together to silently spread the damage done >> by cosmic rays and/or bad memory throughout a file system and this >> corruption then spreads to your backups. >> >> Given this, aren't the various ZFS communities--particularly those that are >> small machine oriented --other than FreeNAS (and even they don't say it >> as strongly enough in their docs), doing users a great disservice by >> implicitly encouraging them to use ZFS w/o ECC RAM or on machines that can't >> use ECC RAM? >> >> As an indication of how persuaded I've been for the need of ECC RAM, I've >> shut down my personal server and am not going to access that data until I've >> built a new machine with ECC RAM. >> >> Phil >> >>  ECC vs non-ECC RAM and ZFS: >> http://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/ >> >>  cyberjock: "So when you read about how using ZFS is an "all or none" I'm >> not just making this up. I'm really serious as it really does work that way. >> ZFS either works great or doesn't work at all. That really truthfully [is] >> how it works." >> >>  ZFS-macos, NAS4Free, PC-BSD, ZFS on Linux >> >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "zfs-macos" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to zfs-macos+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >
Description: S/MIME cryptographic signature