Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

Richard Elling Thu, 12 Jan 2012 17:03:19 -0800

On Jan 12, 2012, at 2:34 PM, Jim Klimov wrote:

> I guess I have another practical rationale for a second
> checksum, be it ECC or not: my scrubbing pool found some
> "unrecoverable errors". Luckily, for those files I still
> have external originals, so I rsynced them over. Still,
> there is one file whose broken prehistory is referenced
> in snapshots, and properly fixing that would probably
> require me to resend the whole stack of snapshots.
> That's uncool, but a subject for another thread.
> 
> This thread is about checksums - namely, now, what are
> our options when they mismatch the data? As has been
> reported by many blog-posts researching ZDB, there do
> happen cases when checksums are broken (i.e. bitrot in
> block pointers, or rather in RAM while the checksum was
> calculated - so each ditto copy of BP has the error),
> but the file data is in fact intact (extracted from
> disk with ZDB or DD, and compared to other copies).


Metadata is at least doubly redundant and checksummed.
Can you provide links to posts that describe this failure mode?

> For these cases bloggers asked (in vain) - why is it
> not allowed for an admin to confirm validity of end-user
> data and have the system reconstruct (re-checksum) the
> metadata for it?.. IMHO, that's a valid RFE.

Metadata is COW, too. Rewriting the data also rewrites the metadata.

> While the system is scrubbing, I was reading up on theory.
> Found a nice text "Keeping Bits Safe: How Hard Can It Be?"
> by David Rosenthal [1], where I stumbled upon an interesting
> thought:
>  The bits forming the digest are no different from the
>  bits forming the data; neither is magically incorruptible.
>  ...Applications need to know whether the digest has
>  been changed.

Hence for ZFS, the checksum (digest) is kept in the parent metadata.

The condition described above can affect T10 DIF-style checksums, but not ZFS.

> In our case, where original checksum in the blockpointer
> could be corrupted in (non-ECC) RAM of my home-NAS just
> before it was dittoed to disk, another checksum - copy
> of this same one, or a differently calculated one, could
> provide ZFS with the means to determine whether the data
> or one of the checksums got corrupted (or all of them).
> Of course, this is not an absolute protection method,
> but it can reduce the cases where pools have to be
> "destroyed, recreated and recovered from tape".

Nope.

> It is my belief that using dedup contributed to my issue -
> there's lots more of updating the block pointers and their
> checksums, so it gradually becomes more likely that the
> metadata (checksum) blocks gets broken (i.e. in non-ECC
> RAM), while the written-once userdata remains intact...
> 
> --
> [1] http://queue.acm.org/detail.cfm?id=1866298
> While the text discusses what all ZFSers mostly know
> already - about bit-rot, MTTDL and such, it does so with
> great detail and many examples, and gave me a better
> understanding of it all even though I deal with this for
> several years now. A good read, I suggest it to others ;)
> 
> //Jim Klimov
> _______________________________________________

 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
SCALE 10x, Los Angeles, Jan 20-22, 2012










_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

Reply via email to