Hi,

rdiff-backup saves by default user/group names so that they can be restored on a different server even if they have different uid/gid. You could try with --preserve-numerical-ids but I'm not sure this will help because I think rdiff-backup will still try to save both, just favorize one or the other.

So, if that doesn't help, we could think about a caching mechanism but not any time soon. I'd ask you to create an issue as RFE for this.

KR, Eric

On 11/07/2025 12:15, Ty Boyack wrote:
Hi,b

Like so many others I want to express my thanks for all the recent development (and previous development) that has gone into this great piece of software!

I have noticed a slowdown that could be from a change in our system or from code changes in rdiff-backup, but either way I am wondering if performance could be improved.

Our situation is that we have a volume of user network home directories, which has over 50,000 top level directories in it (and millions of files below that). Each directory is owned by a different user. I've been backing it up for years with rdiff-backup without any problems.

As I am migrating to new storage servers, the "list increments" command is painfully slow, taking hours to complete. Tracing the system calls I see that it is doing a newfstatat() call on every one of those 50k top level directories. I don't know if this was done in previous versions or if this is new. Our previous storage systems used NIS to look up usernames/uids (which was very fast), while the new one uses Active Directory/SSSD. The calls to SSSD are what is causing the slowness, taking around half a second per call. (I'll happily entertain thoughts that this is a problem and too slow, but for now I need to accept it as a given.)

My question for the rdiff-backup developers, is why do we need to do stat all these top level directories to get the list of increments at all? Shouldn't that information all be in the rdiff-backup-data folder? If we do need to do some form of stat on these top directories, can it be done in such a way as to work with numberic uid/gid info rather than initiating a call to name services? If we did not have to hit Active Directory for each of those folders the speed would be drastically improved in this use case.

I have only spent a little time looking into this issue so I might not be seeing everything correctly, but I'd love to hear thoughts about this.

Thanks,

-Ty





Reply via email to