[rdiff-backup-users] rdiff-backup memory usage problems

David Thu, 20 Aug 2009 02:44:27 -0700

Hi there.

For massive filesystems (eg: millions of files), it seems like
rdiff-backup likes to use a large amount of RAM, and the amount used
keeps growing while the scan proceeds.


Here is my setup:

rdiff-backup, version 1.2.8-1 (version which ships with Debian Lenny).

I set it up like this:

# Create 2 duplicate directories of an existing huge one:

rsync --progress  --numeric-ids --delete
--link-dest=/some/huge/filesystem --exclude=/rdiff-backup-data -azH
/some/huge/filesystem/ /dest/dir/source
rsync --progress  --numeric-ids --delete
--link-dest=/some/huge/filesystem --exclude=/rdiff-backup-data -azH
/some/huge/filesystem/ /dest/dir/dest

The above creates a hardlink snapshot copy of an existing huge
filesystem in 2 different directories. Basically for testing purposes
(and also, this is how my backup scripts work internally, to conserve
backup server harddrive space).

# Run rdiff-backup:

rdiff-backup -v9 --preserve-numerical-ids --no-compare-inode --force
/dest/dir/source/ /dest/dir/dest/

The output is as expected:

[....]

Thu Aug 20 11:13:55 2009  Backup: must_escape_dos_devices = 0
Thu Aug 20 11:13:55 2009  Starting mirror new to files
Thu Aug 20 11:13:55 2009  Processing changed file .

Although I would like to see more details with -v9, like which files
are being compared.

And then, while this is running, top reports that rdiff-backup is
using an increasing % of memory the whole time. And eventually,
rdiff-backup causes a lot of swapping, which slows things down a huge
amount and causes other problems on the server, and rdiff-backup never
finishes either (4 days later...), causing the other backups to never
run.

This makes rdiff-backup unsuitable for backing up our servers with
larger filesystems :-(. I'm experimenting with other backup tools, but
I'd ideally like to use rdiff-backup, if the memory-usage  this
particular memory leak was fixed. I'm even tempted to make my own
version of rdiff-backup, just to work around this issue :-(  (since
rdiff-backup's Python logic looks really complicated).

Is it not possible for rdiff-backup to use an algorithm closer to
rsync, like an incremental file list, instead of loading a huge number
of per-file details into memory?

I have an idea that I may be causing this problem myself (with my
hardlink-based copies), but theoretically rdiff-backup should be able
to handle this in a memory-efficient way. And I need to use that kind
of logic to preserve harddrive space on the backup server.

I see that Debian's version of rdiff-backup is a bit behind the
development version on the rdiff-backup site. But, looking at the
changelog, there doesn't seem to be anything related to this in there.

Any suggestions? Do other people have this problem? Should I file a
bug for this?

Thanks,

David.


_______________________________________________
rdiff-backup-users mailing list at [email protected]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

[rdiff-backup-users] rdiff-backup memory usage problems

Reply via email to