Re: rsync: use xxhash

Stuart Henderson Tue, 06 Jun 2023 02:30:59 -0700

On 2023/06/06 09:06, Jan Stary wrote:
> On Jun 06 06:41:39, [email protected] wrote:
> > > Thank you for enabling this. I am testing an current/amd64,
> > > rsyncing a 4G dir of video files, about 150-250 MB each.
> > >
> > > I am touching the files before every run,
> > > otherwise rsync just finishes almost instantly,
> > > based on the mtime (right?).
> > 
> > Right.
> > 
> > > Is that a scenario where faster checksums are supposed
> > > to make things faster, matching blocks in large files?
> > 
> > Using the option -c seems rather appropriate to make sure that all
> > files get checksummed, even though touching them might be sufficient
> > in most cases.
> 
> Thanks. Testing again and leaving the network out of it with
> $ time rsync --verbose -ac /path/dir/ /other/disk/dir/  
> 
> before:
> 
>     1m19.74s real     0m13.31s user     0m17.57s system
>     1m19.64s real     0m13.82s user     0m18.36s system
>     1m19.51s real     0m14.12s user     0m18.31s system
> 
> after:
> 
>     1m09.00s real     0m01.06s user     0m14.97s system
>     1m09.04s real     0m00.99s user     0m14.70s system
>     1m09.01s real     0m01.01s user     0m15.25s system
> 
> That's about 9% time saving.
> 
> Jan
>


The time difference made by changing hash algorithm is more obvious if
we have a set of files small enough to fit in cache (so the change in hash
accounts for the bigger part of the time difference).

I have this using files which are identical on both sides and already in
cache:

$ hyperfine -L rsync std,xx '/tmp/rsync-{rsync} -avc unifi* krita* go-openbsd* 
/home/sthen/tmp/x/'
Benchmark 1: /tmp/rsync-std -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/
  Time (mean ± σ):      6.622 s ±  0.032 s    [User: 3.867 s, System: 1.899 s]
  Range (min … max):    6.569 s …  6.679 s    10 runs

Benchmark 2: /tmp/rsync-xx -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/
  Time (mean ± σ):      2.937 s ±  0.097 s    [User: 0.227 s, System: 1.829 s]
  Range (min … max):    2.839 s …  3.189 s    10 runs

Summary
  '/tmp/rsync-xx -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/' ran
    2.26 ± 0.08 times faster than '/tmp/rsync-std -avc unifi* krita* 
go-openbsd* /home/sthen/tmp/x/'

This is obviously artificial but not totally unrealistic (say you're
fetching a popular set of files from a mirror, they're likely to be in
cache at least on the mirror side, and for the mirror operator even a
smaller saving is helpful when it's multiplied across more users).

Additionally when the files *do* differ, the hashes are run again on
blocks in the file to locate the differences, as well as the initial
check on the whole file contents. I don't have a good way to test this
but I assume this will result in a bigger improvement in those cases.

Re: rsync: use xxhash

Reply via email to