Hi there. For massive filesystems (eg: millions of files), it seems like rdiff-backup likes to use a large amount of RAM, and the amount used keeps growing while the scan proceeds.
Here is my setup: rdiff-backup, version 1.2.8-1 (version which ships with Debian Lenny). I set it up like this: # Create 2 duplicate directories of an existing huge one: rsync --progress --numeric-ids --delete --link-dest=/some/huge/filesystem --exclude=/rdiff-backup-data -azH /some/huge/filesystem/ /dest/dir/source rsync --progress --numeric-ids --delete --link-dest=/some/huge/filesystem --exclude=/rdiff-backup-data -azH /some/huge/filesystem/ /dest/dir/dest The above creates a hardlink snapshot copy of an existing huge filesystem in 2 different directories. Basically for testing purposes (and also, this is how my backup scripts work internally, to conserve backup server harddrive space). # Run rdiff-backup: rdiff-backup -v9 --preserve-numerical-ids --no-compare-inode --force /dest/dir/source/ /dest/dir/dest/ The output is as expected: [....] Thu Aug 20 11:13:55 2009 Backup: must_escape_dos_devices = 0 Thu Aug 20 11:13:55 2009 Starting mirror new to files Thu Aug 20 11:13:55 2009 Processing changed file . Although I would like to see more details with -v9, like which files are being compared. And then, while this is running, top reports that rdiff-backup is using an increasing % of memory the whole time. And eventually, rdiff-backup causes a lot of swapping, which slows things down a huge amount and causes other problems on the server, and rdiff-backup never finishes either (4 days later...), causing the other backups to never run. This makes rdiff-backup unsuitable for backing up our servers with larger filesystems :-(. I'm experimenting with other backup tools, but I'd ideally like to use rdiff-backup, if the memory-usage this particular memory leak was fixed. I'm even tempted to make my own version of rdiff-backup, just to work around this issue :-( (since rdiff-backup's Python logic looks really complicated). Is it not possible for rdiff-backup to use an algorithm closer to rsync, like an incremental file list, instead of loading a huge number of per-file details into memory? I have an idea that I may be causing this problem myself (with my hardlink-based copies), but theoretically rdiff-backup should be able to handle this in a memory-efficient way. And I need to use that kind of logic to preserve harddrive space on the backup server. I see that Debian's version of rdiff-backup is a bit behind the development version on the rdiff-backup site. But, looking at the changelog, there doesn't seem to be anything related to this in there. Any suggestions? Do other people have this problem? Should I file a bug for this? Thanks, David. _______________________________________________ rdiff-backup-users mailing list at [email protected] http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
