On 2023/06/06 09:06, Jan Stary wrote:
> On Jun 06 06:41:39, [email protected] wrote:
> > > Thank you for enabling this. I am testing an current/amd64,
> > > rsyncing a 4G dir of video files, about 150-250 MB each.
> > >
> > > I am touching the files before every run,
> > > otherwise rsync just finishes almost instantly,
> > > based on the mtime (right?).
> >
> > Right.
> >
> > > Is that a scenario where faster checksums are supposed
> > > to make things faster, matching blocks in large files?
> >
> > Using the option -c seems rather appropriate to make sure that all
> > files get checksummed, even though touching them might be sufficient
> > in most cases.
>
> Thanks. Testing again and leaving the network out of it with
> $ time rsync --verbose -ac /path/dir/ /other/disk/dir/
>
> before:
>
> 1m19.74s real 0m13.31s user 0m17.57s system
> 1m19.64s real 0m13.82s user 0m18.36s system
> 1m19.51s real 0m14.12s user 0m18.31s system
>
> after:
>
> 1m09.00s real 0m01.06s user 0m14.97s system
> 1m09.04s real 0m00.99s user 0m14.70s system
> 1m09.01s real 0m01.01s user 0m15.25s system
>
> That's about 9% time saving.
>
> Jan
>
The time difference made by changing hash algorithm is more obvious if
we have a set of files small enough to fit in cache (so the change in hash
accounts for the bigger part of the time difference).
I have this using files which are identical on both sides and already in
cache:
$ hyperfine -L rsync std,xx '/tmp/rsync-{rsync} -avc unifi* krita* go-openbsd*
/home/sthen/tmp/x/'
Benchmark 1: /tmp/rsync-std -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/
Time (mean ± σ): 6.622 s ± 0.032 s [User: 3.867 s, System: 1.899 s]
Range (min … max): 6.569 s … 6.679 s 10 runs
Benchmark 2: /tmp/rsync-xx -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/
Time (mean ± σ): 2.937 s ± 0.097 s [User: 0.227 s, System: 1.829 s]
Range (min … max): 2.839 s … 3.189 s 10 runs
Summary
'/tmp/rsync-xx -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/' ran
2.26 ± 0.08 times faster than '/tmp/rsync-std -avc unifi* krita*
go-openbsd* /home/sthen/tmp/x/'
This is obviously artificial but not totally unrealistic (say you're
fetching a popular set of files from a mirror, they're likely to be in
cache at least on the mirror side, and for the mirror operator even a
smaller saving is helpful when it's multiplied across more users).
Additionally when the files *do* differ, the hashes are run again on
blocks in the file to locate the differences, as well as the initial
check on the whole file contents. I don't have a good way to test this
but I assume this will result in a bigger improvement in those cases.