Hi Eric,

Thanks very much for the follow up and for keeping the ideas flowing!

The command I am using is just:
rdiff-backup list increments /path/to/backup/
so I'm not trying to get the size.

I like your thought though, so I just tried both:
rdiff-backup list increments --no-size /path/to/backup/
and
rdiff-backup list increments --size /path/to/backup/
and --no-size was the same behavior as not using any size related option, while adding --size was slower (but of course has more info).

I can see the issue most clearly with a command like:
strace -e newfstatat -f rdiff-backup list increments --no-size /path/to/backup/ 2>&1 | egrep '/path/to/backup'

(The pipe to egrep is just to clean up the output and not show all the newfstatats to system files.)

Rdiff-backup appears to do:
1) a few newfstatat calls to files within the rdiff-backup-data folder, then
2) a newfstatat of each top level folder in the backup repo, and then
3) newfstatat calls on files within the rdiff-backup-data folder like all of the increments.date.dir and current_mirror files.

So if my backup repository looks like:
/path/to/backup/directoryA
/path/to/backup/directoryB
/path/to/backup/directoryC
/path/to/backup/rdiff-backup-data

I'd see groups of newfstatat calls in this order:
/path/to/backup/
/path/to/backup/rdiff-backup-data
/path/to/backup/directoryA *
/path/to/backup/directoryB *
/path/to/backup/directoryC *
/path/to/backup/rdiff-backup-data *
/path/to/backup/rdiff-backup-data/increments.date.dir...
/path/to/backup/rdiff-backup-data/increments.date.dir...
/path/to/backup/rdiff-backup-data/increments.date.dir...
/path/to/backup/rdiff-backup-data/current_mirror

The slow behavior is coming from the time it spends outside of the rdiff-backup-data folder -- the ones marked with an asterisk.

I am running rdiff-backup 2.2.6 on Rocky Linux 10. Sorry I didn't mention that before!

Again, thanks for any thoughts on this.

-Ty


On 7/17/25 11:33 PM, ewl+rdiffbac...@lavar.de wrote:
** Caution: EXTERNAL Sender **

Hi,

sorry, I misread you.

Do you use the list increments with size? It takes the size directly
from the file system, as it isn't stored anywhere, and can take quite a
long time. If not, I'd have to dig deeper into the code.

KR, Eric

On 12/07/2025 09:53, Ty Boyack wrote:
Thanks for that thought, Eric! Just for clarity, backups work great, are
not unnecessarily slow, and I am happy with the way the user id mappings
all work. It is just the "list increments" options that is causing the
question.

I think the --preserve-numerical-ids is not available during a "list
increments" operation. My question is really wondering what happens
during a list increments process. I would think it would look into the
rdiff-backup-directory and get the list of increment directories or
files and return the list of increments. But instead it stats all of the
top level folders which is what takes so much time. Why does it need to
stat all the top level directories? What is gained by stating all those
rather than just pulling data from the rdiff-backup-data folder? in
other words I don't want to request a complex caching mechanism if it is
just a step that could be skipped altogether.

Thanks for the help and thoughts,

-Ty



On 7/12/25 1:36 AM, ewl+rdiffbac...@lavar.de wrote:
** Caution: EXTERNAL Sender **

Hi,

rdiff-backup saves by default user/group names so that they can be
restored on a different server even if they have different uid/gid.
You could try with --preserve-numerical-ids but I'm not sure this will
help because I think rdiff-backup will still try to save both, just
favorize one or the other.

So, if that doesn't help, we could think about a caching mechanism but
not any time soon. I'd ask you to create an issue as RFE for this.

KR, Eric

On 11/07/2025 12:15, Ty Boyack wrote:
Hi,b

Like so many others I want to express my thanks for all the recent
development (and previous development) that has gone into this great
piece of software!

I have noticed a slowdown that could be from a change in our system or
from code changes in rdiff-backup, but either way I am wondering if
performance could be improved.

Our situation is that we have a volume of user network home directories, which has over 50,000 top level directories in it (and millions of files
below that). Each directory is owned by a different user. I've been
backing it up for years with rdiff-backup without any problems.

As I am migrating to new storage servers, the "list increments" command is painfully slow, taking hours to complete. Tracing the system calls I
see that it is doing a newfstatat() call on every one of those 50k top
level directories. I don't know if this was done in previous versions or
if this is new. Our previous storage systems used NIS to look up
usernames/uids (which was very fast), while the new one uses Active
Directory/SSSD. The calls to SSSD are what is causing the slowness,
taking around half a second per call. (I'll happily entertain thoughts
that this is a problem and too slow, but for now I need to accept it as
a given.)

My question for the rdiff-backup developers, is why do we need to do
stat all these top level directories to get the list of increments at
all? Shouldn't that information all be in the rdiff-backup-data folder?
If we do need to do some form of stat on these top directories, can it
be done in such a way as to work with numberic uid/gid info rather than
initiating a call to name services? If we did not have to hit Active
Directory for each of those folders the speed would be drastically
improved in this use case.

I have only spent a little time looking into this issue so I might not
be seeing everything correctly, but I'd love to hear thoughts about
this.

Thanks,

-Ty









--
-==============================================================-
  Ty Boyack
  NREL IT Engineer

  Please put all IT help requests though the ticketing system at:
  https://services.warnercnr.colostate.edu/
  or email to:
  wcnr_it_supp...@colostate.edu
-==============================================================-


Reply via email to