On 2013-01-20 16:56, Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris) wrote:
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Jim Klimov
And regarding the "considerable activity" - AFAIK there is little way
for ZFS to reliably read and test "TXGs newer than X"
My understanding is like this: When you make a snapshot, you're just creating
a named copy of the present latest TXG. When you zfs send incremental from one
snapshot to another, you're creating the delta between two TXG's, that happen
to have names. So when you break a mirror and resilver, it's exactly the same
operation as an incremental zfs send, it needs to calculate the delta between
the latest (older) TXG on the previously UNAVAIL device, up to the latest TXG
on the current pool. Yes this involves examining the meta tree structure, and
yes the system will be very busy while that takes place. But the work load is
very small relative to whatever else you're likely to do with your pool during
normal operation, because that's the nature of the meta tree structure ... very
small relative to the rest of your data.
Hmmm... Given that many people use automatic snapshots, those do
provide us many roots for branches of block-pointer tree after a
certain TXG (creation of snapshot and the next live variant of
the dataset).
This might allow resilvering to quickly select only those branches
of the metadata tree that are known or assumed to have changed after
a disk was temporarily lost - and not go over datasets (snapshots)
that are known to have been committed and closed (became read-only)
while that disk was online.
I have no idea if this optimization does take place in ZFS code,
but it seems "bound to be there"... if not - a worthy RFE, IMHO ;)
//Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss