As I recently wrote, my data pool has experienced some
"unrecoverable errors". It seems that a userdata block
of deduped data got corrupted and no longer matches the
stored checksum. For whatever reason, raidz2 did not
help in recovery of this data, so I rsync'ed the files
over from another copy. Then things got interesting...
Bug alert: it seems the block-pointer block with that
mismatching checksum did not get invalidated, so my
attempts to rsync known-good versions of the bad files
from external source seemed to work, but in fact failed:
subsequent reads of the files produced IO errors.
Apparently (my wild guess), upon writing the blocks,
checksums were calculated and the matching DDT entry
was found. ZFS did not care that the entry pointed to
inconsistent data (not matching the checksum now),
it still increased the DDT counter.
The problem was solved by disabling dedup for the dataset
involved and rsync-updating the file in-place. After the
dedup feature was disabled and new blocks were uniquely
written, everything was readable (and md5sums matched)
I think of a couple of solutions:
If the block is detected to be corrupt (checksum mismatches
the data), the checksum value in blockpointers and DDT
should be rewritten to an "impossible" value, perhaps
all-zeroes or such, when the error is detected.
Alternatively (opportunistically), a flag might be set
in the DDT entry requesting that a new write mathching
this stored checksum should get committed to disk - thus
"repairing" all files which reference the block (at least,
stopping the IO errors).
Alas, so far there is anyways no guarantee that it was
not the checksum itself that got corrupted (except for
using ZDB to retrieve the block contents and matching
that with a known-good copy of the data, if any), so
corruption of the checksum would also cause replacement
of "really-good-but-normally-inaccessible" data.
(Bug reported to Illumos: https://www.illumos.org/issues/1981)
zfs-discuss mailing list