Re: rsync: use xxhash
On 2023/06/06 09:06, Jan Stary wrote: > On Jun 06 06:41:39, icepic...@gmail.com wrote: > > > Thank you for enabling this. I am testing an current/amd64, > > > rsyncing a 4G dir of video files, about 150-250 MB each. > > > > > > I am touching the files before every run, > > > otherwise rsync just finishes almost instantly, > > > based on the mtime (right?). > > > > Right. > > > > > Is that a scenario where faster checksums are supposed > > > to make things faster, matching blocks in large files? > > > > Using the option -c seems rather appropriate to make sure that all > > files get checksummed, even though touching them might be sufficient > > in most cases. > > Thanks. Testing again and leaving the network out of it with > $ time rsync --verbose -ac /path/dir/ /other/disk/dir/ > > before: > > 1m19.74s real 0m13.31s user 0m17.57s system > 1m19.64s real 0m13.82s user 0m18.36s system > 1m19.51s real 0m14.12s user 0m18.31s system > > after: > > 1m09.00s real 0m01.06s user 0m14.97s system > 1m09.04s real 0m00.99s user 0m14.70s system > 1m09.01s real 0m01.01s user 0m15.25s system > > That's about 9% time saving. > > Jan > The time difference made by changing hash algorithm is more obvious if we have a set of files small enough to fit in cache (so the change in hash accounts for the bigger part of the time difference). I have this using files which are identical on both sides and already in cache: $ hyperfine -L rsync std,xx '/tmp/rsync-{rsync} -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/' Benchmark 1: /tmp/rsync-std -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/ Time (mean ± σ): 6.622 s ± 0.032 s[User: 3.867 s, System: 1.899 s] Range (min … max):6.569 s … 6.679 s10 runs Benchmark 2: /tmp/rsync-xx -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/ Time (mean ± σ): 2.937 s ± 0.097 s[User: 0.227 s, System: 1.829 s] Range (min … max):2.839 s … 3.189 s10 runs Summary '/tmp/rsync-xx -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/' ran 2.26 ± 0.08 times faster than '/tmp/rsync-std -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/' This is obviously artificial but not totally unrealistic (say you're fetching a popular set of files from a mirror, they're likely to be in cache at least on the mirror side, and for the mirror operator even a smaller saving is helpful when it's multiplied across more users). Additionally when the files *do* differ, the hashes are run again on blocks in the file to locate the differences, as well as the initial check on the whole file contents. I don't have a good way to test this but I assume this will result in a bigger improvement in those cases.
Re: rsync: use xxhash
On Jun 06 06:41:39, icepic...@gmail.com wrote: > > Thank you for enabling this. I am testing an current/amd64, > > rsyncing a 4G dir of video files, about 150-250 MB each. > > > > I am touching the files before every run, > > otherwise rsync just finishes almost instantly, > > based on the mtime (right?). > > Right. > > > Is that a scenario where faster checksums are supposed > > to make things faster, matching blocks in large files? > > Using the option -c seems rather appropriate to make sure that all > files get checksummed, even though touching them might be sufficient > in most cases. Thanks. Testing again and leaving the network out of it with $ time rsync --verbose -ac /path/dir/ /other/disk/dir/ before: 1m19.74s real 0m13.31s user 0m17.57s system 1m19.64s real 0m13.82s user 0m18.36s system 1m19.51s real 0m14.12s user 0m18.31s system after: 1m09.00s real 0m01.06s user 0m14.97s system 1m09.04s real 0m00.99s user 0m14.70s system 1m09.01s real 0m01.01s user 0m15.25s system That's about 9% time saving. Jan
Re: rsync: use xxhash
Den tis 6 juni 2023 kl 00:00 skrev Jan Stary : > On Jun 05 12:37:10, s...@spacehopper.org wrote: > > reminded by the dwz mail, rsync would also like to use xxhash if > > available: > Thank you for enabling this. I am testing an current/amd64, > rsyncing a 4G dir of video files, about 150-250 MB each. > > I am touching the files before every run, > otherwise rsync just finishes almost instantly, > based on the mtime (right?). Right. > Is that a scenario where faster checksums are supposed > to make things faster, matching blocks in large files? Using the option -c seems rather appropriate to make sure that all files get checksummed, even though touching them might be sufficient in most cases. -- May the most significant bit of your life be positive.
Re: rsync: use xxhash
On Jun 05 12:37:10, s...@spacehopper.org wrote: > reminded by the dwz mail, rsync would also like to use xxhash if > available: > > 'The xxHash library (https://cyan4973.github.io/xxHash/) provides > extremely fast checksum functions that can make the "rsync algorithm" > run much more quickly, especially when matching blocks in large files. > Installing this development library adds xxhash checksums as the default > checksum algorithm. You'll need at least v0.8.0 if you want rsync to > include the full range of its checksum algorithms.' Thank you for enabling this. I am testing an current/amd64, rsyncing a 4G dir of video files, about 150-250 MB each. I am touching the files before every run, otherwise rsync just finishes almost instantly, based on the mtime (right?). $ touch /dload/Catastrophe/S*/* $ time rsync -Hai4 --del /path/dir/ remote:/path/dir/ Is that a scenario where faster checksums are supposed to make things faster, matching blocks in large files? Before: 3m07.07s real 0m20.55s user 0m08.15s system 3m11.96s real 0m19.84s user 0m08.07s system 3m06.73s real 0m19.96s user 0m07.89s system After: 3m06.68s real 0m19.88s user 0m07.99s system 3m13.86s real 0m19.83s user 0m08.38s system 3m06.63s real 0m20.67s user 0m08.02s system Jan > while xxHash does provide standard shared+static libraries, it is more > commonly used as a "header-only library" (done here and also in dwz) > so there's no additional run dependency in rsync for this. > > ok? > > Index: Makefile > === > RCS file: /cvs/ports/net/rsync/Makefile,v > retrieving revision 1.97 > diff -u -p -r1.97 Makefile > --- Makefile 5 Jan 2023 21:59:21 - 1.97 > +++ Makefile 5 Jun 2023 11:36:48 - > @@ -1,6 +1,7 @@ > COMMENT =mirroring/synchronization over low bandwidth links > > DISTNAME = rsync-3.2.7 > +REVISION = 0 > CATEGORIES = net > HOMEPAGE = https://rsync.samba.org/ > > @@ -19,12 +20,12 @@ MODULES = lang/python > > MODPY_RUNDEP = No > > -BUILD_DEPENDS = textproc/py-commonmark${MODPY_FLAVOR} > +BUILD_DEPENDS = textproc/py-commonmark${MODPY_FLAVOR} \ > + sysutils/xxhash > > SEPARATE_BUILD =Yes > CONFIGURE_STYLE =gnu > CONFIGURE_ARGS =--disable-lz4 \ > - --disable-xxhash \ > --disable-zstd \ > --with-included-popt \ > --with-included-zlib \ > @@ -33,6 +34,8 @@ CONFIGURE_ARGS =--disable-lz4 \ > --with-rsh=/usr/bin/ssh \ > --with-nobody-user=_rsync \ > --with-nobody-group=_rsync > +CONFIGURE_ENV +=CPPFLAGS="-I${LOCALBASE}/include -DXXH_INLINE_ALL=1" \ > + ac_cv_search_XXH64_createState="" > > .include > > @@ -41,8 +44,7 @@ CONFIGURE_ARGS +=--enable-md5-asm > .endif > > .if ${FLAVOR:Miconv} > -CONFIGURE_ENV +=CPPFLAGS='-I${LOCALBASE}/include' \ > - LDFLAGS='-L${LOCALBASE}/lib' > +CONFIGURE_ENV +=LDFLAGS='-L${LOCALBASE}/lib' > LIB_DEPENDS += converters/libiconv > WANTLIB += iconv > .endif > >
rsync: use xxhash
reminded by the dwz mail, rsync would also like to use xxhash if available: 'The xxHash library (https://cyan4973.github.io/xxHash/) provides extremely fast checksum functions that can make the "rsync algorithm" run much more quickly, especially when matching blocks in large files. Installing this development library adds xxhash checksums as the default checksum algorithm. You'll need at least v0.8.0 if you want rsync to include the full range of its checksum algorithms.' while xxHash does provide standard shared+static libraries, it is more commonly used as a "header-only library" (done here and also in dwz) so there's no additional run dependency in rsync for this. ok? Index: Makefile === RCS file: /cvs/ports/net/rsync/Makefile,v retrieving revision 1.97 diff -u -p -r1.97 Makefile --- Makefile5 Jan 2023 21:59:21 - 1.97 +++ Makefile5 Jun 2023 11:36:48 - @@ -1,6 +1,7 @@ COMMENT = mirroring/synchronization over low bandwidth links DISTNAME = rsync-3.2.7 +REVISION = 0 CATEGORIES = net HOMEPAGE = https://rsync.samba.org/ @@ -19,12 +20,12 @@ MODULES = lang/python MODPY_RUNDEP = No -BUILD_DEPENDS =textproc/py-commonmark${MODPY_FLAVOR} +BUILD_DEPENDS =textproc/py-commonmark${MODPY_FLAVOR} \ + sysutils/xxhash SEPARATE_BUILD =Yes CONFIGURE_STYLE =gnu CONFIGURE_ARGS =--disable-lz4 \ - --disable-xxhash \ --disable-zstd \ --with-included-popt \ --with-included-zlib \ @@ -33,6 +34,8 @@ CONFIGURE_ARGS =--disable-lz4 \ --with-rsh=/usr/bin/ssh \ --with-nobody-user=_rsync \ --with-nobody-group=_rsync +CONFIGURE_ENV +=CPPFLAGS="-I${LOCALBASE}/include -DXXH_INLINE_ALL=1" \ + ac_cv_search_XXH64_createState="" .include @@ -41,8 +44,7 @@ CONFIGURE_ARGS +=--enable-md5-asm .endif .if ${FLAVOR:Miconv} -CONFIGURE_ENV +=CPPFLAGS='-I${LOCALBASE}/include' \ - LDFLAGS='-L${LOCALBASE}/lib' +CONFIGURE_ENV +=LDFLAGS='-L${LOCALBASE}/lib' LIB_DEPENDS += converters/libiconv WANTLIB += iconv .endif