> Why do we want to adapt "zfs send" to do something it was never > intended > to do, and probably won't be adapted to do (well, if at all) anytime > soon instead of > optimizing existing technologies for this use case?
The only time I see or hear of anyone using "zfs send" in a way it wasn't intended is when people store the datastream on tape or a filesystem, instead of feeding it directly into "zfs receive." Although it's officially discouraged for this purpose, there is value in doing so, and I can understand why some people sometimes (including myself) would have interest in doing this. So let's explore the reasons it's discouraged to store a "zfs send" datastream: #1 If a single bit goes bad, the whole dataset is bad. #2 You can only receive the whole filesystem. You cannot granularly restore a single file or directory. Now, if you acknowledge these two points, let's explore why somebody might want to do it anyway: To counter #1: Let's acknowledge that storage media is pretty reliable. We've all seen tapes and disks go bad, but usually they don't. If you've got a new tape archive every week or every month... The probability of *all* of those tapes having one or more bad bits is astronomically low. Nonzero risk, but a calculated risk. To counter #2: There are two basic goals for backups. (a) to restore some stuff upon request, or (b) for the purposes of DR, to guarantee your manager that you're able to get the company back into production quickly after a disaster. Such as the building burning down. ZFS send to tape does not help you in situation (a). So we can conclude that "zfs send" to tape is not sufficient as an *only* backup technique. You need something else, and at most, you might consider "zfs send" to tape as an augmentation to your other backup technique. Still ... If you're in situation (b) then you want as many options available to you as possible. I've helped many people and/or companies before, who ... Had backup media, but didn't have the application that wrote the backup media and therefore couldn't figure out how to restore. ... Had a backup system that was live synchronizing the master file server to a slave file server, and then when something blew up the master, it propagated and deleted the slave too. In this case, the only thing that saved them was an engineer who had copied the whole directory a week ago onto his iPod, if you can believe that. ... Had backup tapes but no tape drive ... Had archives on DVD, and the DVD's were nearly all bad ... Looked through the backups only to discover something critical had been accidentally excluded ... Point is, having as many options available as possible is worthwhile in the disaster situation. Please see below for some more info, as it ties into some more of what you've said ... > But I got it. "zfs send" is fast. Let me ask you this, Ed...where do > you "zfs send" > your data to? Another pool? Does it go to tape eventually? If so, > what is the setup > such that it goes to tape? I apologize for asking here, as I'm sure > you described it > in one of the other threads I mentioned, but I'm not able to go digging > in those > threads at the moment. Here is my backup strategy: I use "zfs send | ssh somehost 'zfs receive'" to send nightly incrementals to a secondary backup server. This way, if something goes wrong with the primary fileserver, I can simply change the IP address of the secondary, and let it assume the role of the primary. With the unfortunate loss of all of today's data ... going back to last night. I have had to do this once before, in the face of primary fileserver disaster and service contract SLA failure by Netapp... All the users were very pleased that I was able to get them back into production using last night's data in less than a few minutes. >From the secondary server, I "zfs send | zfs receive" onto removable hard disks. This is ideal to restore either individual files, or the whole filesystem. No special tools would be necessary to restore on any random ZFS server in the future, and nothing could be faster. In fact, you wouldn't even need to restore if you wanted to in a pinch, you could work directly on the external disks. However, removable disks are not very reliable compared to tapes, and the disks are higher cost per GB, and require more volume in the safe deposit box, so the external disk usage is limited... Only going back for 2-4 weeks of archive... So there is also a need for tapes. Once every so often, from the secondary server, I "zfs send" the whole filesystem onto tape for archival purposes. This would only be needed after a disaster, and also the failure or overwriting of the removable disks. We have so many levels of backups, this is really unnecessary, but it makes me feel good. And finally just because the data is worth millions of dollars, I also use NetBackup to write tapes from the secondary server. This way, nobody could ever blame me if the data were ever lost somehow. I won't get sued or have criminal charges pressed against me; my reputation will remain intact. I'm protecting against the possibility of me being an idiot. > I ask this because I see an opportunity to kill 2 birds with one > stone. With proper > NDMP support and "zfs send" performance, why can't you get the > advantages of > "zfs send" without trying to shoehorn "zfs send" into a use it's not > designed for? I could be speaking out of line, but I think fundamentally, NDMP support cannot have "zfs send" performance. Because "zfs send" is taking advantage of the copy-on-write to efficiently generate a datastream from disk blocks, it doesn't even need to think about the filesystem or the files; it's just reading a prescribed set of disk blocks, as fast and as sequential as possible. No need to search and examine files to see what's changed; you already have a simple list of disk blocks to read. Whereas NDMP, tar, star, rsync, and everything else that I know of are all fundamentally a file-level backup system, thus necessitating a far less efficient random access, with filesystem reading, seeking, and processing required. But sure, if it's possible, NDMP or any other protocol or format performing like "zfs send" would be awesome. ;-) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss