Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

Edward Ned Harvey Wed, 17 Mar 2010 08:20:23 -0700

> Why do we want to adapt "zfs send" to do something it was never
> intended
> to do, and probably won't be adapted to do (well, if at all) anytime
> soon instead of
> optimizing existing technologies for this use case?


The only time I see or hear of anyone using "zfs send" in a way it wasn't
intended is when people store the datastream on tape or a filesystem,
instead of feeding it directly into "zfs receive."

Although it's officially discouraged for this purpose, there is value in
doing so, and I can understand why some people sometimes (including myself)
would have interest in doing this.

So let's explore the reasons it's discouraged to store a "zfs send"
datastream:
#1  If a single bit goes bad, the whole dataset is bad.
#2  You can only receive the whole filesystem.  You cannot granularly
restore a single file or directory.

Now, if you acknowledge these two points, let's explore why somebody might
want to do it anyway:

To counter #1:
Let's acknowledge that storage media is pretty reliable.  We've all seen
tapes and disks go bad, but usually they don't.  If you've got a new tape
archive every week or every month...  The probability of *all* of those
tapes having one or more bad bits is astronomically low.  Nonzero risk, but
a calculated risk.

To counter #2:
There are two basic goals for backups.  (a) to restore some stuff upon
request, or (b) for the purposes of DR, to guarantee your manager that
you're able to get the company back into production quickly after a
disaster.  Such as the building burning down.

ZFS send to tape does not help you in situation (a).  So we can conclude
that "zfs send" to tape is not sufficient as an *only* backup technique.
You need something else, and at most, you might consider "zfs send" to tape
as an augmentation to your other backup technique.

Still ... If you're in situation (b) then you want as many options available
to you as possible.  I've helped many people and/or companies before, who
...  Had backup media, but didn't have the application that wrote the backup
media and therefore couldn't figure out how to restore.   ...  Had a backup
system that was live synchronizing the master file server to a slave file
server, and then when something blew up the master, it propagated and
deleted the slave too.  In this case, the only thing that saved them was an
engineer who had copied the whole directory a week ago onto his iPod, if you
can believe that.  ...  Had backup tapes but no tape drive  ...  Had
archives on DVD, and the DVD's were nearly all bad  ...  Looked through the
backups only to discover something critical had been accidentally excluded
...

Point is, having as many options available as possible is worthwhile in the
disaster situation.

Please see below for some more info, as it ties into some more of what
you've said ...


> But I got it.  "zfs send" is fast.  Let me ask you this, Ed...where do
> you "zfs send"
> your data to? Another pool?  Does it go to tape eventually?  If so,
> what is the setup
> such that it goes to tape?  I apologize for asking here, as I'm sure
> you described it
> in one of the other threads I mentioned, but I'm not able to go digging
> in those
> threads at the moment.

Here is my backup strategy:

I use "zfs send | ssh somehost 'zfs receive'" to send nightly incrementals
to a secondary backup server.  

This way, if something goes wrong with the primary fileserver, I can simply
change the IP address of the secondary, and let it assume the role of the
primary.  With the unfortunate loss of all of today's data ... going back to
last night.  I have had to do this once before, in the face of primary
fileserver disaster and service contract SLA failure by Netapp...  All the
users were very pleased that I was able to get them back into production
using last night's data in less than a few minutes.

>From the secondary server, I "zfs send | zfs receive" onto removable hard
disks.  This is ideal to restore either individual files, or the whole
filesystem.  No special tools would be necessary to restore on any random
ZFS server in the future, and nothing could be faster.  In fact, you
wouldn't even need to restore if you wanted to in a pinch, you could work
directly on the external disks.  However, removable disks are not very
reliable compared to tapes, and the disks are higher cost per GB, and
require more volume in the safe deposit box, so the external disk usage is
limited...  Only going back for 2-4 weeks of archive...

So there is also a need for tapes.  Once every so often, from the secondary
server, I "zfs send" the whole filesystem onto tape for archival purposes.
This would only be needed after a disaster, and also the failure or
overwriting of the removable disks.  We have so many levels of backups, this
is really unnecessary, but it makes me feel good.

And finally just because the data is worth millions of dollars, I also use
NetBackup to write tapes from the secondary server.  This way, nobody could
ever blame me if the data were ever lost somehow.  I won't get sued or have
criminal charges pressed against me; my reputation will remain intact.  I'm
protecting against the possibility of me being an idiot.


> I ask this because I see an opportunity to kill 2 birds with one
> stone.  With proper
> NDMP support and "zfs send" performance, why can't you get the
> advantages of
> "zfs send" without trying to shoehorn "zfs send" into a use it's not
> designed for?

I could be speaking out of line, but I think fundamentally, NDMP support
cannot have "zfs send" performance.  Because "zfs send" is taking advantage
of the copy-on-write to efficiently generate a datastream from disk blocks,
it doesn't even need to think about the filesystem or the files; it's just
reading a prescribed set of disk blocks, as fast and as sequential as
possible.  No need to search and examine files to see what's changed; you
already have a simple list of disk blocks to read.  Whereas NDMP, tar, star,
rsync, and everything else that I know of are all fundamentally a file-level
backup system, thus necessitating a far less efficient random access, with
filesystem reading, seeking, and processing required.

But sure, if it's possible, NDMP or any other protocol or format performing
like "zfs send" would be awesome.  ;-)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

Reply via email to