On Thu, Jan 12, 2012 at 05:01:48PM -0800, Richard Elling wrote: > > This thread is about checksums - namely, now, what are > > our options when they mismatch the data? As has been > > reported by many blog-posts researching ZDB, there do > > happen cases when checksums are broken (i.e. bitrot in > > block pointers, or rather in RAM while the checksum was > > calculated - so each ditto copy of BP has the error), > > but the file data is in fact intact (extracted from > > disk with ZDB or DD, and compared to other copies). > > Metadata is at least doubly redundant and checksummed.
The implication is that the original calculation of the checksum was bad in ram (undetected due to lack of ECC), and then written out redundantly and fed as bad input to the rest of the merkle construct. The data blocks on disk are correct, but they fail to verify against the bad metadata. The complaint appears to be that ZFS makes this 'worse' because the (independently verified) valid data blocks are inaccessible. Worse than what? Corrupted file data that is then accurately checksummed and readable as valid? Accurate data that is read without any assertion of validity, in a traditional filesystem? There's an inherent value judgement here that will vary by judge, but in each case it's as much a judgement on the value of ECC and reliable hardware, and your data and time enacting various kinds of recovery, as it is the value of ZFS. The same circumstance could, in principle, happen due to bad CPU even with ECC. In either case, the value of ZFS includes that an error has been detected you would otherwise have been unaware of, and you get a clue that you need to fix hardware and spend time. -- Dan.
pgpE29pepViE2.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss