While reading about zfs on-disk formats, I wondered once again
why is it not possible to create a snapshot on existing data,
not of the current TXG but of some older point-in-time?

From what I gathered, definition of a snapshot requires the
cut-off TXG number existence of some blocks in this dataset
with smaller-or-equal TXG numbers. It seems like just a
coincidence that current TXG is used and older TXGs aren't.

Is it deemed inconvenient/unpractical/useless/didn't think of,
or are there some fundamental or technological drawbacks to
the idea?

Note: this idea is related to my proposal in October thread
"[zfs-discuss] (Incremental) ZFS SEND at sub-snapshot level"
and could aid "restartable zfs-send" by creation of smaller
snapshots for incremental sending of existing large datasets.

Today I had a new twist on the idea, though: as I wrote
in other posts, my raidz2 did not help protecting some
of my data. One of the damaged files belongs to a stack
of snapshots that are continually replicated from another
box, and the inconsistent on-disk block is referenced in
an old snapshot (almost at the root of stack). Resending
and re-receiving the whole stack of snapshots is possible,
but inconvenient and slow. RSyncing just the difference
(good data instead of IO-Erroring byte range) to repair
the file would forfeit further increnmental snapshot syncs.

So I thought: it would be nice if it were possible (perhaps
not now, but in the future as an RFE) to resend and replace
just that snapshot in the middle or even root of the stack.
Perhaps even better, with ZDB or some other tools I might
determine which blocks have rottened and which TXG they
belonged to, and I'd "fence" that TXG on the source and
destination systems with proposed "injected snapshots".
Older and newer snapshots around this TXG range would
provide incremental changes to data, as they normally do,
and I'd only quickly replace a small intermittent snapshot.

All this needs is a couple of not-yet-existing features...

PS: I think that this idea might even have some "business
case" foundation for active-passive clusters with zfs send
updating a passive cluster node. Whenever scrub on one of
the systems finds an unrecoverable block in older data,
the node might request "just it" from the other head.
Likewise for backups to removable media, etc.
If we already have a ZFS-based storage similar to an
out-of-sync mirror, why not use the available knowledge
of known-good blocks to repair detected {small} errors
in large volumes of "same" data?

What do you think?..
//Jim Klimov

zfs-discuss mailing list

Reply via email to