Thanks for that thought, Eric! Just for clarity, backups work great, are not unnecessarily slow, and I am happy with the way the user id mappings all work. It is just the "list increments" options that is causing the question.

I think the --preserve-numerical-ids is not available during a "list increments" operation. My question is really wondering what happens during a list increments process. I would think it would look into the rdiff-backup-directory and get the list of increment directories or files and return the list of increments. But instead it stats all of the top level folders which is what takes so much time. Why does it need to stat all the top level directories? What is gained by stating all those rather than just pulling data from the rdiff-backup-data folder? in other words I don't want to request a complex caching mechanism if it is just a step that could be skipped altogether.

Thanks for the help and thoughts,

-Ty



On 7/12/25 1:36 AM, ewl+rdiffbac...@lavar.de wrote:
** Caution: EXTERNAL Sender **

Hi,

rdiff-backup saves by default user/group names so that they can be
restored on a different server even if they have different uid/gid.
You could try with --preserve-numerical-ids but I'm not sure this will
help because I think rdiff-backup will still try to save both, just
favorize one or the other.

So, if that doesn't help, we could think about a caching mechanism but
not any time soon. I'd ask you to create an issue as RFE for this.

KR, Eric

On 11/07/2025 12:15, Ty Boyack wrote:
Hi,b

Like so many others I want to express my thanks for all the recent
development (and previous development) that has gone into this great
piece of software!

I have noticed a slowdown that could be from a change in our system or
from code changes in rdiff-backup, but either way I am wondering if
performance could be improved.

Our situation is that we have a volume of user network home directories,
which has over 50,000 top level directories in it (and millions of files
below that). Each directory is owned by a different user. I've been
backing it up for years with rdiff-backup without any problems.

As I am migrating to new storage servers, the "list increments" command
is painfully slow, taking hours to complete. Tracing the system calls I
see that it is doing a newfstatat() call on every one of those 50k top
level directories. I don't know if this was done in previous versions or
if this is new. Our previous storage systems used NIS to look up
usernames/uids (which was very fast), while the new one uses Active
Directory/SSSD. The calls to SSSD are what is causing the slowness,
taking around half a second per call. (I'll happily entertain thoughts
that this is a problem and too slow, but for now I need to accept it as
a given.)

My question for the rdiff-backup developers, is why do we need to do
stat all these top level directories to get the list of increments at
all? Shouldn't that information all be in the rdiff-backup-data folder?
If we do need to do some form of stat on these top directories, can it
be done in such a way as to work with numberic uid/gid info rather than
initiating a call to name services? If we did not have to hit Active
Directory for each of those folders the speed would be drastically
improved in this use case.

I have only spent a little time looking into this issue so I might not
be seeing everything correctly, but I'd love to hear thoughts about this.

Thanks,

-Ty






--
-==============================================================-
  Ty Boyack
  NREL IT Engineer

  Please put all IT help requests though the ticketing system at:
  https://services.warnercnr.colostate.edu/
  or email to:
  wcnr_it_supp...@colostate.edu
-==============================================================-


Reply via email to