I am catching up with some 500 posts that I skipped this
summer, and came up with a new question. In short, is it
possible to add "restartability" to ZFS SEND, for example
by adding artificial snapshots (of configurable increment
size) into already existing datasets [too large to be
zfs-sent successfully as one chunk of stream data]?
I'll start by pre-history of this question, and continue
with the detailed idea below:
On one hand, there was a post about a T2000 system kernel
panicking while trying to import a pool. It was probable that
the pool was receiving a large (3Tb) zfs send stream, and
this receiving was aborted due to some external issues.
Afterwards the pool apparently got into a cycle of trying
to destroy the received part of the stream during a pool
import attempt, exhausted all RAM and hanged the server.
From my experience reported this spring to the forums (alas,
which are now gone - and the forums-to-mail replication did
not work at that time) and to the Illumos bugtracker, I hope
that the OP's pool did get imported after a few weeks of
power cycles. I had different conditions (destroying some
snapshots and datasets on a deduped pool) with similar effect.
On another hand, there was a discussion (actually, lots
of them) about "rsync vs. zfs send".
My new question couples these threads.
I know that it has been discussed for a number of times
that ZFS SEND is more efficient at finding differences and
sending updates that a filesystem crawl and calculating
checksums all over again. However, RSYNC has an important
benefit of being restartable.
As shown by the first post I mentioned, broken ZFS SEND
operation can lead to long downtimes. With sufficiently
large increments (i.e. initial stream of a large dataset),
low bandwidth'es and high probability of network errors
or power glitches, it may be even guaranteed to never
transfer that much data as to complete a single ZFS SEND
operation; for example, when replicating 3Tb over a
few-Kbps subscriber-level internet link which is reset
every 24 hours for ISP's traffic accounting reasons.
On the opposite, it is easy to construct an rsync loop
which would transfer all files after several weeks of
But that would not be a ZFS-snapshot replica, so further
updates can not be made via ZFS SEND either - locking the
user into rsync loops forever.
Now, I wondered if it is possible to embed snapshots
(or some similar construct) into existing data, for the
purpose of tab-keeping during zfs send and zfs recv?
For example, the same existing 3Tb dataset could be
artificially pre-represented as an horde of snapshots
each utilizing 1Gb of disk space, with valid ZFS
incremental sends over whatever network link we have.
However unlike zfs-auto-snap, these snapshots would not
really appear on-disk while the dataset was being
written (historically). Instead, they would be patched-on
by the admins after the factual data appeared on disk,
before the ZFS SEND.
Alternatively, if the ZFS SEND is detected to have
been broken, the sending side might set a "tab" on the
offset where it was last reading the sent data. The
receiver (upon pool import or whatever other recovery)
also would set such a tab, instead of destroying the
broken snapshot (which may take weeks and lots of
downtime as proved by several reports on the list,
including mine) and restarting from scratch - likely
doomed to be broken as well.
In terms of code this would probably be like the
normal "zfs snapshot" mixed with the reverse of
"zfs destroy @snapshot", meaning that some existing
blocks would be reassigned as "owned" by a newly
embedded snapshot instead of being "owned" by the
live dataset or some more recent snapshot...
zfs-discuss mailing list