On 17/02/2024 03:14, Robert Nichols wrote:
On 2/16/24 08:44, Dominic Raferd wrote:
Until then, I am interested in your parallel processing approach.
Presumably you start 8 parallel rdiff-backup verify sessions for
datetime points -1 to -8 (and then, when they are all complete, -9 to
-16, -17 to -23...)? And you run 8 in parallel because your CPU has 8
cores?
I have 16 cores, actually, but by experiment I found that 8 parallel
threads seems to be the sweet spot. I don't know how much of that is
unique to my system and the nature of my backups. I did have to add a
pre-scan of the file_statistics metadata files to look for increment
sizes of 1GB or greater, and limit the number of parallel checks to 1
if any are found. All it takes is one huge ISO file in the increments
to gobble up cache and make the parallel checks really slow. I haven't
spent much time trying to tune that adjustment, and all the
experimenting was done back when I had just 32GB of RAM.
I let the parallel threads run independently, without waiting for
anything in the others. Effectively, I run the threads with the level
sequences:
{-1..-99..8}
{-2..-99..8}
{-3..-99..8}
...
{-8..-99..8}
and then just wait for everything to complete.
The code is really nothing like that, but that is the effect. You
might expect the threads to get badly out of sync, but because of the
effects of I/O caching, whichever threads finish a step first find
themselves slowed down by I/O waits more than do the threads that
advance to a new step later. The threads tend to stay quite
beautifully in sync. Again, that's on my system with my backups. YMMV.
Very interesting. A while ago I set my timedicer-verify script to run
verifications in parallel but it seemed to make everything slower not
faster, admittedly when running on a much less powerful (and virtual)
machine than yours, so I stripped out all that code. But I should look
at it again (I guess I must have backups!)...