On Mon, 10 Nov 2014 11:53:33 -0800 Russell Button <[email protected]> wrote:
> I get the impression that AFS is this amorphous cloud of data storage. > So when you backup stuff, it's not as if it's organized by machine and > file system. It's not really much different for AFS than most other things. Your files are stored on various fileservers and clients access those fileservers over the network to access it. There is some AFS-specific data that is stored in addition to the contents of your files themselves (permissions, location data, etc), so if you want to back up that stuff, you'd need a backup tool that is AFS-aware. But that's the same for a lot of other things; e.g. software that integrates with backing up VMWare VMs needs to be aware of and integrate with VMWare. You _can_ just use a "regular" backup tool to backup AFS, but it's usually not recommended. There are a couple of ways you could do that, but they have serious downsides: 1. Point the backup tool at /afs/cell/, like you do for any AFS-accessing application, and backup each individual file by just reading it out of AFS. The problems with this is that this approach is slow and causes a lot of unnecessary load on the fileservers (especially when data has not changed), and it means you lose AFS-specific metadata, such as AFS directory permissions and AFS mountpoint data. So, it would be pretty annoying to put your cell back together after catastrophic data loss, if all you had is backup data in this form; but you would still have the data. 2. Point the backup tool at the /vicep* directories on the fileservers (which is where the fileservers store data for AFS files). This would ensure that you backup all of the AFS metadata, and is probably more efficient than the approach in '1.'. However, the files in those directories are stored in a very particular format and structure, and you'd need to know how to get the files you're used to back out; and you are not guaranteed to get a consistent snapshot of the AFS data when you do this. This approach is similar to backing up data on a local filesystem by just 'dd'ing a raw image from e.g. /dev/sda4 (except incrementals would be a bit easier). > With this much data, spread out over 3 geographically distant data > centers, it's not as if you can do a full dump on the 1st of the month > and then do daily incrementals for the month, and then start over > again next month. Well, you could do that, but yeah, certainly as the size of your data set increases this gets pretty painful. However, I thought you had your data organized into AFS volumes that effectively never changed after a certain point (that is, at Telmate; Timothy Balcer has posted here before). I may not be remembering that correctly; it's been a while. If that is correct, though, then you don't need to worry about incrementals and such for a large amount of data; you just need to back it up once and then never touch it again (except perhaps to verify that it indeed never changes). > Does anyone Out There have a similar problem, and if so, what strategy > did you use? Others can share their own experiences, but I'll at least mention some of the options. Teradactyl's TiBS: <http://www.teradactyl.com/backup-solutions/backup-platforms/openafs-backup.html>. IIRC Teradactyl likes to trumpet their "synthetic" full dumps, which is a feature where they use existing incremental and full dumps to generate a new "full" dump every so often. This addresses what you were talking about before, because it avoids needing to retain e.g. hundreds of daily incrementals, but also avoids needing to periodically dump all data at once. Stephen Joyce's BackupAFS: <http://user.physics.unc.edu/~stephen/BackupAFS/>. I'm not sure how many sites use this, but from what I remember at least Stephen Joyce says it works well :) The backup system that comes with OpenAFS (I sometimes refer to this as the "native" OpenAFS backup system). This is a bit wonky to set up and use, but it does still work, and some places use it successfully. Information about it can be found in chapter 6 of the admin guide "Configuring the AFS Backup System": <http://docs.openafs.org/AdminGuide/HDRWQ248.html>. That documentation is very old, but this backup system has been almost unchanged for probably over a decade, so it's still probably accurate. If the book you mentioned talks about a backup system without giving it a specific name, I assume it's talking about this one. And finally, the option that I think quite a lot of sites use, is to just develop your own scripts that run "vos dump" on everything and store the dump blobs somewhere. But you need at least a bit of knowledge about AFS in order to do that, especially if you are to handle incremental dumps and such. There are/were also ways to integrate AFS into some other backup tools, like TSM, AMANDA, Bacula, and some other commercial ones. I would not recommend any of the integration tools except for the TSM ones; but I assume you wouldn't want to run TSM just for backing up AFS. It could also help when thinking about this to nail down a few more of your requirements (whether or not you discuss that here, but at least for yourself). You've described a bit about the data you're backing up (except for how it's organized within AFS, but you may not know that), but I don't think you've mentioned the restore requirements. For example, if you want end-users to be able to restore data themselves, or if restoring is an administrative operation. Also, whether you need to restore data based on /afs file path, or if restoring entire AFS volumes is okay (I'm not aware of any publicly-available backup solution that works per-file... except maybe the TSM integration? And the '1.' approach listed way up top). Do you have any existing backups framework that you might just want AFS to integrate into? I'm also not sure if you have an idea of whether you want to be backing up to tape, disk, or some other media; or maybe you're not sure and are asking for advice on that as well? :) -- Andrew Deason [email protected] _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
