Re: Problems with making hardlink-based backups
On Tue, Aug 18, 2009 at 5:11 PM, Andrew Sackville-West wrote: > I've had rdiff-backup fail because of mis-matched versions. Again, not > to belabor the obvious, but do you have compatible versions of > rdiff-backup on each machine? If you have compatible (i.e., the same) > versions on both ends and still have problems, then perhaps you should > file a bug report. > I'm well aware of those problems, so I don't even bother to use rdiff-backup over the network. That mode is next to useless unless you can guarantee the same versions, and tbh, it sucks compared to rsync for network transfers. How I use rdiff-backup is like this: 1) Make a temporary snapshot copy of the rdiff-backup repo (minus the rdiff-backup-data directory), using hardlinks. 2) rsync from the source server over to the temporary copy (this should be safe, since rsync doesn't overwrite files in place unless you tell it to) 3) Run rdiff-backup to push the latest temporary copy onto the rdiff-backup history. The above works fairly well for me, although rdiff-backup sometimes gets confused about the hardlinks. About filing a bug report, not sure how much that's going to help, since the mailing list wasn't very informative. I get the idea that the main developer has either abandoned the project, or is taking a break for a few months (as evidenced by the recent bug tracker activity, and the lack of useful replies in the mailing list). David. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Problems with making hardlink-based backups
On Tue, Aug 18, 2009 at 03:11:47PM +0200, David wrote: [...] > Basically, rdiff-backup was perfect for a while. But then we upgraded > the server to Lenny. And then it stopped working T_T. I think that > rdiff-backup's author must have changed something, which now causes > huge ram usage for large file lists, or other per-file data of some > kind. imo that's unnecessary (it could just use something like a set > of Python iterators in a clever way, or work with incremental file > lists like rsync), but I didn't get any useful replies on their > mailing list when I mentioned my problem and gave a few ideas. > I've had rdiff-backup fail because of mis-matched versions. Again, not to belabor the obvious, but do you have compatible versions of rdiff-backup on each machine? If you have compatible (i.e., the same) versions on both ends and still have problems, then perhaps you should file a bug report. A signature.asc Description: Digital signature
Re: Problems with making hardlink-based backups
On Mon, Aug 17, 2009 at 6:26 PM, Andrew Sackville-West wrote: > Here's another question: what is stored in all these millions of > files? > [...] Basically, the lions share would be tonnes of user-generated files, for example huge numbers of image files (and thumbnails) that get stored in directory structures on one of the file servers. Other examples would be extensive music & sound libraries, several debian/ubuntu/etc mirrors, and so on. About tarring before backing up. Yeah, that's possible too (for some types of data/directory layouts). But then something needs to (on the file server side), check if the tars are still up to date. And also those tars will take up a lot of precious harddrive space on the file server :-(. Unless you mean remove the original data.. which is problematic in a few ways. And of course, storing different versions of those tars (eg: users move files around at the source) is also problematic. Basically... as you say it would be like tail wagging the dog. Things would get a lot more complicated & fragile, and in exchange I get a lot of other, more serious backup problems, which are harder to work around than the current issues. About moving to database. Well the filesystem is already a database :-). And then trying to keep backups of that (multi-TB) database itself is a major problem. Not to mention, users and software now have to go through some other software to get to their files... don't want to go there.. my head hurts ^^; The file servers themselves do have a large number of files... that isn't really the problem. The problem is actually in the backup software which causes issues trying to handle history for those backups (either using massive amounts of memory/cpu, or creating massive numbers of hardlinks, and so on). Basically, rdiff-backup was perfect for a while. But then we upgraded the server to Lenny. And then it stopped working T_T. I think that rdiff-backup's author must have changed something, which now causes huge ram usage for large file lists, or other per-file data of some kind. imo that's unnecessary (it could just use something like a set of Python iterators in a clever way, or work with incremental file lists like rsync), but I didn't get any useful replies on their mailing list when I mentioned my problem and gave a few ideas. So for now, a combination of ugly hacks with hardlink-type pruning for history snapshots, and blindly deleting older backup generations to get space back when needed. At least until I find a better solution. Anyway, thanks for your ideas :-) David. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Problems with making hardlink-based backups
On Mon, Aug 17, 2009 at 10:59:20AM +0200, David wrote: > Thanks for the replies. [...] > > Basically, the problem isn't that I don't know how to use rsync, cp, > etc to make the backups, manage generations, etc... the problem is an > incredibly large filesystem (as in number of hardlinks, and actual > directories, to a smaller extent), resulting from the hardlink > snapshot-based approach (as opposed to something like rdiff-backup, > which only stores the differences between the generations). [...] ah. well, that is a problem isn't it. I can see why you'd like to stick with a diff based backup then. Is there someway you can control the number of files by tarring up sections of the filesystem prior to backup? If you have a lot of high churn files, then you'll likely be duplicating them anyway, so tarring up the whole lot might make sense. Then you backup the tarballs instead. Here's another question: what is stored in all these millions of files? And what is their purpose. Is it a case of using a filesystem when a database might be a better option? Perhaps the whole problem you're facing on the backend could be better solved by looking at the front end. Of course, you'll want to avoid the tail wagging the dog... just a couple of thoughts. good luck. A signature.asc Description: Digital signature
Re: Problems with making hardlink-based backups
Err.. and another post on backuppc, sorry. I think that backuppc is actually going to have the same problem (with massive filesystems causing du and locate, etc to become next to unusable for the backup storage directories). The reason for this: "Therefore, every file in the pool will have at least 2 hard links (one for the pool file and one for the backup file below __TOPDIR__/pc). Identical files from different backups or PCs will all be linked to the same file. When old backups are deleted, some files in the pool might only have one link. BackupPC_nightly checks the entire pool and removes all files that have only a single link, thereby recovering the storage for that file." ie, there are actually hardlinks for every file for every server for every backup generation. Still going to have a bazillion files for du and locate to go through, even if they are stored in a nice pool system. BackupPC has some nice features, but it's not going to fix my problem :-(. Ideally I would have kept using rdiff-backup, but for now I'm going to go with hardlink snapshots & pruning (with text file restore info) details. Is my use case really that unusual? (wanting to run 'du' and 'locate' on a backup server, which has a lot of generations of data from other servers that contain a huge number of files themself). Going to ask about this general problem over at the backuppc mailing list, maybe people there have more ideas :-) David. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Problems with making hardlink-based backups
Sorry for spamming the list.. I think I didn't read the docs correctly, before posting the above. It seems that backuppc actully does keep recent snapshots that aren't in the pool... so scripts, admins, etc can get to them easily without going through backuppc scripts. It looks like backuppc actually maintain a hardlinked version of the backed up server, outside the pool. Specifically, the section about "TOPDIR__/pool", in the docs: http://backuppc.sourceforge.net/faq/BackupPC.html I should read the docs more carefully :-) David. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Problems with making hardlink-based backups
On Sat, Aug 15, 2009 at 4:35 AM, Rob Owens wrote: > You might want to check out BackupPC. It uses hardlinks and > compression. It's got a web-based GUI which makes it pretty easy to > find statistics on disk space used, on a per-server basis. > I've been researching backuppc, and it seems like it wants to store everything in a pool, including the latest backup. Is there a way to keep the latest backup outside the pool area? Reason being, that while the pool is a very space-efficient, the layout is somewhat opaque, and afaict it's not very straightforward to get to the actual backed up files (by scripts, admin users, etc, logged into the backup server). Places where I'm forseeing problems: 1) Offsite-backups. My current scripts use rsync to update the latest snapshots (for each user, server, etc), over to a set of external drives. With backuppc, I'll probably have to find the correct backuppc script incantation (or hack together something), to restore the latest backup to a temporary location on the backup server, before copying over to the external drive. Problems: a. Complicated b. Going to be slow (slower than if there was an existing directory) c. Going to use up a lot of extra harddrive space on the backup server, to store the restored snapshot (for eg: backed up file servers). Unless I work out something ugly whereby uncompressed backuppc hardlinks are linked to a new structure.. (this is incredibly ugly). d. Inefficient - if only a few files have changed on a huge backed-up filesystem, you still need to restore the entire snapshot out of the backuppc pool. 2) Admin-friendly. It's simpler for admins to find browse through files in a directory structure on the backup server, on a command-line (or with winscp or samba or whatever), rather than having to go through a web frontend. 99% of the time they're looking for stuff from the latest snapshot, so it's acceptible for them (or myself) to have to run special commands to get to the older versions. But the latest snapshot I do actually want to be present on the harddrive (rather than hidden away in a pool). 3) Utility-friendly. With a directory structure, I can run du and determine which files are huge, or use other unixy things. Without it, I and scripts, admins, etc, have to go through the backuppc-approved channels ... unnecessary complication imo. --- I guess one way to do this, is to use the regular rsync-based backup methods, to make/update the latest snapshot, and then backup that with backuppc. But that has the following disadvantages: 1) Lots more disk usage. Backuppc would be making an indepdendant copy of all the data. It won't be eg, making hardlinks against the latest snapshot, or reverse incrementals, or something like that. 2) Redundnant and complicated. Backuppc is meant to be a "one stop", automated thing. If I'm already handling scheduling and the actual transports, etc from my scripts, then it's redundant. All that it's being used for is it's pooled approach, which still has the above problems. --- Basically.. what I would require from backuppc, is a way to tell it to preserve a local copy of the latest snapshots (in easy-to-find locations on the backup server, so admins or scripts can use them directly), and to only move older versions to the pool... while at the same time taking advantage of the latest snapshot to conserve backup server harddrive space (reverse incremental, hardlink to it, etc). Does anyone who is familiar with backuppc know if the above is possible? (Although I kind of doubt it at this point. My use cases seem to break the backuppc design ^^; ) I should probably post about this to the backuppc mailing lists too.. their users would have a lot more relevant experience. In the meanwhile, I'll probably continue to use a pruned hardlinks approach. David. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Problems with making hardlink-based backups
Thanks for the replies. On Fri, Aug 14, 2009 at 5:05 PM, Andrew Sackville-West wrote: >> >> du worked pretty well with rdiff-backup, but is very problematic with >> a large number of hardlink-based snapshots, which each have a complete >> "copy" of a massive filesystem (rather than just info on which files >> changed). > > but they're not copies, they're hardlinks. I guess I don't understand > the problem. > [...] I understand that (actually it's the whole point of using a hardlinked-based snapshot system :-)). What I meant by "massive filesystem" was a huge number of additional file entries, that gets created for every snapshot. That causes major problems for utilities like du and locate that need to walk the entire filesystem. That's why I've been making scripts to "prune" files, so that there are fewer filesystem entries to be walked by those tools. > > If you are using hardlinks, and nice discrete directories for each > machine, then a machine that has infrequent changes will not use a lot > of space because the files don't change. > [...] Thanks, also understand that. My problem however, is that all of the backups (for all the servers) are on a single LVM partition. But when the LVM is full, then I need to run a tool like 'du' to check where space can be reclaimed. That's no longer working nicely (takes days to finish, and makes huge, multi-gb output files). From experience I know which servers are more likely to have the most disk usage "churn", and I've been removing those one's oldest entries recently, to recover space, but I'd like to also be able to run 'du' effectively, rather than relying on hunches. > > you should be able to look at the difference between disk usage over > different time periods and figure out your "burn rate". > [...] Thanks for those ideas also (also considered this). However this still doesn't let me to use tools like du and locate nicely, due to the huge number of filesystem entries (and I really want to be able to use those tools, or at least du, to be able to actually check where harddrive space is being used). Again, that's why I'm having to consider pruning-type approaches (which seem like an awful hack, but I'm not sure of a better method at this time). > I suspect I'm telling you stuff you already know and I apologize if I > appear condescending. The odds are you probably know more about > backups than I do. hth. Nah, it's fine :-). Better too many ideas, rather than assuming I'm aware of all the possible options, and leaving out something that might have been useful. On Fri, Aug 14, 2009 at 5:49 PM, Alan Chandler wrote: > Andrew Sackville-West wrote: > I'm not sure I understood what you are after either. Admittedly on a rather > small home server, I use the cp -alf command to have only changed files kept > for a long time Thanks for those cron and script entries. I guess I could use something like that too (have X dailies, Y weeklies, Z monthlies, etc), and it would save more harddrive space. But actually, managing generations of backups to conserve harddrive space (and still have some really old backups) isn't really the problem. The problem is: 1) Source servers (fileservers, etc) have millions+ files. There are a couple of servers like this. 2) Hardlink snapshot process makes a duplicate of the filesystem structure (for each of the above servers) each time on the backup server. 3) Backup server ends up with an exponentially large number of files, compared to any of the servers actually being backed up The filesystem can support (3), but utilities like 'du' and 'updatedb' become almost unusable. That's the main problem. Basically, the problem isn't that I don't know how to use rsync, cp, etc to make the backups, manage generations, etc... the problem is an incredibly large filesystem (as in number of hardlinks, and actual directories, to a smaller extent), resulting from the hardlink snapshot-based approach (as opposed to something like rdiff-backup, which only stores the differences between the generations). On Sat, Aug 15, 2009 at 4:35 AM, Rob Owens wrote: >> > You might want to check out BackupPC. It uses hardlinks and > compression. It's got a web-based GUI which makes it pretty easy to > find statistics on disk space used, on a per-server basis. > I've been researching the various other, more integrated backup solutions (amanda, bacula, etc), but I have two main problems with them: 1) They are too over-engineered/complex for my liking, and the docs are hard to understand. I prefer simple command-line tools like rsync, etc, which I can script. Also don't really want to have to install their special backup tool-specific services everywhere on the network if I can avoid it. 2) I can't find information on how most of them actually store their backed up data. So they could very well have either the same problem, or have other issues that I become unable to work around if I want to use that tool. Thanks for your backuppc suggestion. I have h
Re: Problems with making hardlink-based backups
On Thu, Aug 13, 2009 at 09:20:17AM +0200, David wrote: > Hi list. > > Until recently I was using rdiff-backup for backing up our servers. > But after a certain point, the rdiff-backup process would take days, > use up huge amounts of CPU and RAM, and the debug logs were very > opaque. And the rdiff-backup mailing lists weren't very helpful > either. > > So, for servers where that's a problem, I changed over to a hardlink > snapshot-based aproach. > > That's been working fine for the past few weeks. But now, the backup > server is running out of harddrive space and we need to check which > backup histories is taking up the most space, so we can prune the > oldest week or two from them. > You might want to check out BackupPC. It uses hardlinks and compression. It's got a web-based GUI which makes it pretty easy to find statistics on disk space used, on a per-server basis. -Rob -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Problems with making hardlink-based backups
Andrew Sackville-West wrote: On Fri, Aug 14, 2009 at 08:43:32AM +0200, David wrote: Thanks for your suggestion, and I have heard of rsnapshot. Although, actually removing older snapshot directories isn't really the problem. The problem is, if you have a large number of such backups (perhaps one per server), then finding out where harddrive space is actually being used, is problematic (when your backup server starts running low on disk space). keep each server's backup in a distinctly separate location. That should make it clear which machines are burning up space. du worked pretty well with rdiff-backup, but is very problematic with a large number of hardlink-based snapshots, which each have a complete "copy" of a massive filesystem (rather than just info on which files changed). but they're not copies, they're hardlinks. I guess I don't understand the problem. In a scheme like that used by rsnapshot, a file is only *copied* once. If it remains unchanged then the subsequent backup directories only carry a hardlink to the file. When older backups are deleted, the hardlinks keep the file around, but no extra room is used. There are only *pointers* to the file lying around. Then when the file changes, a new copy will be made and subsequent backups will hardlink to the new file. Now you'll be using the space of two files with different sets of hardlinks pointing to them. (I'm sure you know this, just making sure we are on common ground). I'm not sure I understood what you are after either. Admittedly on a rather small home server, I use the cp -alf command to have only changed files kept for a long time this is cron.daily backup - I have cron.weekly and cron.monthly similar to this if [ -d $ARCH/daily.6 ] ; then if [ ! -d $ARCH/weekly.1 ] ; then mkdir -p $ARCH/weekly.1 ; fi # Now merge in stuff here with what might already be there using hard links cp -alf $ARCH/daily.6/* $ARCH/weekly.1 # Finally loose the rest rm -rf $ARCH/daily.6 ; fi # Shift along snapshots if [ -d $ARCH/daily.5 ] ; then mv $ARCH/daily.5 $ARCH/daily.6 ; fi if [ -d $ARCH/daily.4 ] ; then mv $ARCH/daily.4 $ARCH/daily.5 ; fi if [ -d $ARCH/daily.3 ] ; then mv $ARCH/daily.3 $ARCH/daily.4 ; fi if [ -d $ARCH/daily.2 ] ; then mv $ARCH/daily.2 $ARCH/daily.3 ; fi if [ -d $ARCH/daily.1 ] ; then mv $ARCH/daily.1 $ARCH/daily.2 ; fi if [ -d $ARCH/snap ] ; then mv $ARCH/snap $ARCH/daily.1 ; fi # Collect new snapshot archive stuff doing daily backup on the way mkdir -p $ARCH/snap leading to daily backups for a week, weekly backups for a month, monthly backups until I archive them into a long term store (write a DVD - although hearing stories about issues with even these it might be easier to leave in the disk). CDARCH=/bak/archive/CDarch-`date +%Y` if [ -d $ARCH/monthly.6 ] ; then if [ ! -d $CDARCH ] ; then mkdir -p $CDARCH ; fi cp -alf $ARCH/monthly.6/* $CDARCH rm -rf $ARCH/monthly.6 fi The backup process uses something like the following to keep an initial backup and save any changed file into this long term storage. This is just one part of the backup - other machines and other file systems use a similar mechanism with just the parameters changed. rsync -aHqz --delete --backup --backup-dir =$ARCH/snap/freeswitch/ $MACH::freeswitch/ /bak/freeswitch/ -- Alan Chandler http://www.chandlerfamily.org.uk -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Problems with making hardlink-based backups
On Fri, Aug 14, 2009 at 08:43:32AM +0200, David wrote: > Thanks for your suggestion, and I have heard of rsnapshot. > > Although, actually removing older snapshot directories isn't really the > problem. > > The problem is, if you have a large number of such backups (perhaps > one per server), then finding out where harddrive space is actually > being used, is problematic (when your backup server starts running low > on disk space). keep each server's backup in a distinctly separate location. That should make it clear which machines are burning up space. > > du worked pretty well with rdiff-backup, but is very problematic with > a large number of hardlink-based snapshots, which each have a complete > "copy" of a massive filesystem (rather than just info on which files > changed). but they're not copies, they're hardlinks. I guess I don't understand the problem. In a scheme like that used by rsnapshot, a file is only *copied* once. If it remains unchanged then the subsequent backup directories only carry a hardlink to the file. When older backups are deleted, the hardlinks keep the file around, but no extra room is used. There are only *pointers* to the file lying around. Then when the file changes, a new copy will be made and subsequent backups will hardlink to the new file. Now you'll be using the space of two files with different sets of hardlinks pointing to them. (I'm sure you know this, just making sure we are on common ground). > > I guess I could do something like removing the oldest snapshot > directories from *all* the backups, until there is enough free space. > But that's kind of wasteful. Like, if I have one server that didn't > change much over 2 years, then I can only keep eg the last 2-3 weeks > of backups, because there is another server that has a huge amount of > file changes in the same period. And not being able to use "du" is > kind of annoying (actually, "locate" is also having major problems, so > I disabled it on the backup server). If you are using hardlinks, and nice discrete directories for each machine, then a machine that has infrequent changes will not use a lot of space because the files don't change. Other than the minimal space used by the hardlinks themselves, you could save a *lot* of "backups" of an unchanged file and use the same space as the one file because there is only one actual copy of the file. That said, the more often you backup rapidly changing data, the bigger the backup gets because you store complete copies for each change. You have to balance the needs of each machine (and probably have a different scheme for each machine). How important is it to have access to a specific change in a file? And for how long do you need access to that specific change? These sorts of questions should help with these decisions. > > That's why I started working on a set of pruning/unpruning scripts, > which basically "move" redundant info (the vast majority) over into > compressed files (with ability to move out again later). Kind of like > moving the snapshot-based approach closer to how rdiff-backup works > (but, not chewing up huge amounts of ram and being hard to diagnose). > That way admins can in theory more easily check where space is being > used (but at the cost of not having quick access to earlier complete > server snapshots). you should be able to look at the difference between disk usage over different time periods and figure out your "burn rate". And using a hardlink approach, you can easily archive older backups and then remove them without laborious pruning. This is because if you delete a file that haas multiple hard links to it, the file will still exist until *all* the hardlinks are gone. So to remove a snapshot from lastweek that contains files that haven't changed, you just remove it. The files that you still need will still be there because you're hardlinked to them. > > But I assume there must be better existing ways of handling this kind > of problem, since backups aren't exactly something new. I suspect I'm telling you stuff you already know and I apologize if I appear condescending. The odds are you probably know more about backups than I do. hth. A > > On Thu, Aug 13, 2009 at 5:48 PM, Andrew > Sackville-West wrote: > > On Thu, Aug 13, 2009 at 09:20:17AM +0200, David wrote: > >> Hi list. > > [...] > >> 3) Existing tools for managing hardlink-based snapshot directories > >> etc. > > > > maybe rsnapshot is what you're after. It does hardlinked snapshots > > with automagical deletion of older backups and configurable frequency > > etc. I quite like it, though I'm not using it for high-volume stuff. > > > > Once little caveat that always seems to get me: the daily won't run > > until you've completed enough hourlies, the weekly won't run until > > you've completed a week's worth of dailies, etc. Very disconcerting > > the first few days of use. > > > > A > > > > -BEGIN PGP SIGNATURE- > > Version: GnuPG v1.4.9 (GNU/Linux) > > >
Re: Problems with making hardlink-based backups
Btw, sorry for top-posting. I don't use mailing lists very often and forgot about the convention. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Problems with making hardlink-based backups
Thanks for your suggestion, and I have heard of rsnapshot. Although, actually removing older snapshot directories isn't really the problem. The problem is, if you have a large number of such backups (perhaps one per server), then finding out where harddrive space is actually being used, is problematic (when your backup server starts running low on disk space). du worked pretty well with rdiff-backup, but is very problematic with a large number of hardlink-based snapshots, which each have a complete "copy" of a massive filesystem (rather than just info on which files changed). I guess I could do something like removing the oldest snapshot directories from *all* the backups, until there is enough free space. But that's kind of wasteful. Like, if I have one server that didn't change much over 2 years, then I can only keep eg the last 2-3 weeks of backups, because there is another server that has a huge amount of file changes in the same period. And not being able to use "du" is kind of annoying (actually, "locate" is also having major problems, so I disabled it on the backup server). That's why I started working on a set of pruning/unpruning scripts, which basically "move" redundant info (the vast majority) over into compressed files (with ability to move out again later). Kind of like moving the snapshot-based approach closer to how rdiff-backup works (but, not chewing up huge amounts of ram and being hard to diagnose). That way admins can in theory more easily check where space is being used (but at the cost of not having quick access to earlier complete server snapshots). But I assume there must be better existing ways of handling this kind of problem, since backups aren't exactly something new. On Thu, Aug 13, 2009 at 5:48 PM, Andrew Sackville-West wrote: > On Thu, Aug 13, 2009 at 09:20:17AM +0200, David wrote: >> Hi list. > [...] >> 3) Existing tools for managing hardlink-based snapshot directories >> etc. > > maybe rsnapshot is what you're after. It does hardlinked snapshots > with automagical deletion of older backups and configurable frequency > etc. I quite like it, though I'm not using it for high-volume stuff. > > Once little caveat that always seems to get me: the daily won't run > until you've completed enough hourlies, the weekly won't run until > you've completed a week's worth of dailies, etc. Very disconcerting > the first few days of use. > > A > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkqENcwACgkQaIeIEqwil4bCHQCeLWJ+9UcjtYqyolT6kiK7kDLy > R20Aniawf/KsnU2uEG7D+35DjoksUJgS > =qhWD > -END PGP SIGNATURE- > > -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Re: Problems with making hardlink-based backups
On Thu, Aug 13, 2009 at 09:20:17AM +0200, David wrote: > Hi list. [...] > 3) Existing tools for managing hardlink-based snapshot directories > etc. maybe rsnapshot is what you're after. It does hardlinked snapshots with automagical deletion of older backups and configurable frequency etc. I quite like it, though I'm not using it for high-volume stuff. Once little caveat that always seems to get me: the daily won't run until you've completed enough hourlies, the weekly won't run until you've completed a week's worth of dailies, etc. Very disconcerting the first few days of use. A signature.asc Description: Digital signature
Problems with making hardlink-based backups
Hi list. Until recently I was using rdiff-backup for backing up our servers. But after a certain point, the rdiff-backup process would take days, use up huge amounts of CPU and RAM, and the debug logs were very opaque. And the rdiff-backup mailing lists weren't very helpful either. So, for servers where that's a problem, I changed over to a hardlink snapshot-based aproach. That's been working fine for the past few weeks. But now, the backup server is running out of harddrive space and we need to check which backup histories is taking up the most space, so we can prune the oldest week or two from them. The problem now is, that 'du' takes days to run, and now generates 4-5 GB text files. This is of course caused by du having to now process dozens of additional snapshot directories, many of them with a large number of files. What I've been doing, is writing helper scripts to help prune the earlier directories. Something like this: 1) Compare files under snapshot1 & snapshot2. If any files under snapshot1 are hardlinks to the same files under snapshot2, then remove them from snapshot1, and add an entry to a text file (for possible later regeneration). 2) Remove empty directories (and add their details to a text file) 3) 7zip-compress the text files containing recovery info. 4) Possibly later (before empty directories): Remove hardlinks and symlinks from snapshot dirs, and add them to compressed text files. There's also scripts to reverse the above operations. Of course, it takes a while to get a complete snapshot of a given server from a few weeks back, but at least it's possible. I'm thinking that this is a lot of work, and there must be better ways of handling this kind of problem. I don't like reinventing the wheel. Given the above, are there any suggestions? eg: 1) Another tool similar to rdiff-backup, which has easier-to-understand logs. 2) A quicker way of running DU for my use case (huge number of hardlinks in directories) 3) Existing tools for managing hardlink-based snapshot directories etc. Thanks in advance. David. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org