Re: rsync: use xxhash

2023-06-06 Thread Stuart Henderson
On 2023/06/06 09:06, Jan Stary wrote:
> On Jun 06 06:41:39, icepic...@gmail.com wrote:
> > > Thank you for enabling this. I am testing an current/amd64,
> > > rsyncing a 4G dir of video files, about 150-250 MB each.
> > >
> > > I am touching the files before every run,
> > > otherwise rsync just finishes almost instantly,
> > > based on the mtime (right?).
> > 
> > Right.
> > 
> > > Is that a scenario where faster checksums are supposed
> > > to make things faster, matching blocks in large files?
> > 
> > Using the option -c seems rather appropriate to make sure that all
> > files get checksummed, even though touching them might be sufficient
> > in most cases.
> 
> Thanks. Testing again and leaving the network out of it with
> $ time rsync --verbose -ac /path/dir/ /other/disk/dir/  
> 
> before:
> 
> 1m19.74s real 0m13.31s user 0m17.57s system
> 1m19.64s real 0m13.82s user 0m18.36s system
> 1m19.51s real 0m14.12s user 0m18.31s system
> 
> after:
> 
> 1m09.00s real 0m01.06s user 0m14.97s system
> 1m09.04s real 0m00.99s user 0m14.70s system
> 1m09.01s real 0m01.01s user 0m15.25s system
> 
> That's about 9% time saving.
> 
> Jan
> 

The time difference made by changing hash algorithm is more obvious if
we have a set of files small enough to fit in cache (so the change in hash
accounts for the bigger part of the time difference).

I have this using files which are identical on both sides and already in
cache:

$ hyperfine -L rsync std,xx '/tmp/rsync-{rsync} -avc unifi* krita* go-openbsd* 
/home/sthen/tmp/x/'
Benchmark 1: /tmp/rsync-std -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/
  Time (mean ± σ):  6.622 s ±  0.032 s[User: 3.867 s, System: 1.899 s]
  Range (min … max):6.569 s …  6.679 s10 runs

Benchmark 2: /tmp/rsync-xx -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/
  Time (mean ± σ):  2.937 s ±  0.097 s[User: 0.227 s, System: 1.829 s]
  Range (min … max):2.839 s …  3.189 s10 runs

Summary
  '/tmp/rsync-xx -avc unifi* krita* go-openbsd* /home/sthen/tmp/x/' ran
2.26 ± 0.08 times faster than '/tmp/rsync-std -avc unifi* krita* 
go-openbsd* /home/sthen/tmp/x/'

This is obviously artificial but not totally unrealistic (say you're
fetching a popular set of files from a mirror, they're likely to be in
cache at least on the mirror side, and for the mirror operator even a
smaller saving is helpful when it's multiplied across more users).

Additionally when the files *do* differ, the hashes are run again on
blocks in the file to locate the differences, as well as the initial
check on the whole file contents. I don't have a good way to test this
but I assume this will result in a bigger improvement in those cases.



Re: rsync: use xxhash

2023-06-06 Thread Jan Stary
On Jun 06 06:41:39, icepic...@gmail.com wrote:
> > Thank you for enabling this. I am testing an current/amd64,
> > rsyncing a 4G dir of video files, about 150-250 MB each.
> >
> > I am touching the files before every run,
> > otherwise rsync just finishes almost instantly,
> > based on the mtime (right?).
> 
> Right.
> 
> > Is that a scenario where faster checksums are supposed
> > to make things faster, matching blocks in large files?
> 
> Using the option -c seems rather appropriate to make sure that all
> files get checksummed, even though touching them might be sufficient
> in most cases.

Thanks. Testing again and leaving the network out of it with
$ time rsync --verbose -ac /path/dir/ /other/disk/dir/  

before:

1m19.74s real 0m13.31s user 0m17.57s system
1m19.64s real 0m13.82s user 0m18.36s system
1m19.51s real 0m14.12s user 0m18.31s system

after:

1m09.00s real 0m01.06s user 0m14.97s system
1m09.04s real 0m00.99s user 0m14.70s system
1m09.01s real 0m01.01s user 0m15.25s system

That's about 9% time saving.

Jan



Re: rsync: use xxhash

2023-06-05 Thread Janne Johansson
Den tis 6 juni 2023 kl 00:00 skrev Jan Stary :
> On Jun 05 12:37:10, s...@spacehopper.org wrote:
> > reminded by the dwz mail, rsync would also like to use xxhash if
> > available:
> Thank you for enabling this. I am testing an current/amd64,
> rsyncing a 4G dir of video files, about 150-250 MB each.
>
> I am touching the files before every run,
> otherwise rsync just finishes almost instantly,
> based on the mtime (right?).

Right.

> Is that a scenario where faster checksums are supposed
> to make things faster, matching blocks in large files?

Using the option -c seems rather appropriate to make sure that all
files get checksummed, even though touching them might be sufficient
in most cases.

-- 
May the most significant bit of your life be positive.



Re: rsync: use xxhash

2023-06-05 Thread Jan Stary
On Jun 05 12:37:10, s...@spacehopper.org wrote:
> reminded by the dwz mail, rsync would also like to use xxhash if
> available:
> 
> 'The xxHash library (https://cyan4973.github.io/xxHash/) provides
> extremely fast checksum functions that can make the "rsync algorithm"
> run much more quickly, especially when matching blocks in large files.
> Installing this development library adds xxhash checksums as the default
> checksum algorithm. You'll need at least v0.8.0 if you want rsync to
> include the full range of its checksum algorithms.'

Thank you for enabling this. I am testing an current/amd64,
rsyncing a 4G dir of video files, about 150-250 MB each.

I am touching the files before every run,
otherwise rsync just finishes almost instantly,
based on the mtime (right?).

$ touch /dload/Catastrophe/S*/*
$ time rsync -Hai4 --del /path/dir/ remote:/path/dir/

Is that a scenario where faster checksums are supposed
to make things faster, matching blocks in large files?

Before:

3m07.07s real 0m20.55s user 0m08.15s system
3m11.96s real 0m19.84s user 0m08.07s system
3m06.73s real 0m19.96s user 0m07.89s system

After:

3m06.68s real 0m19.88s user 0m07.99s system
3m13.86s real 0m19.83s user 0m08.38s system
3m06.63s real 0m20.67s user 0m08.02s system

Jan



> while xxHash does provide standard shared+static libraries, it is more
> commonly used as a "header-only library" (done here and also in dwz)
> so there's no additional run dependency in rsync for this.
> 
> ok?
> 
> Index: Makefile
> ===
> RCS file: /cvs/ports/net/rsync/Makefile,v
> retrieving revision 1.97
> diff -u -p -r1.97 Makefile
> --- Makefile  5 Jan 2023 21:59:21 -   1.97
> +++ Makefile  5 Jun 2023 11:36:48 -
> @@ -1,6 +1,7 @@
>  COMMENT =mirroring/synchronization over low bandwidth links
>  
>  DISTNAME =   rsync-3.2.7
> +REVISION =   0
>  CATEGORIES = net
>  HOMEPAGE =   https://rsync.samba.org/
>  
> @@ -19,12 +20,12 @@ MODULES = lang/python
>  
>  MODPY_RUNDEP =   No
>  
> -BUILD_DEPENDS =  textproc/py-commonmark${MODPY_FLAVOR}
> +BUILD_DEPENDS =  textproc/py-commonmark${MODPY_FLAVOR} \
> + sysutils/xxhash
>  
>  SEPARATE_BUILD =Yes
>  CONFIGURE_STYLE =gnu
>  CONFIGURE_ARGS =--disable-lz4 \
> - --disable-xxhash \
>   --disable-zstd \
>   --with-included-popt \
>   --with-included-zlib \
> @@ -33,6 +34,8 @@ CONFIGURE_ARGS =--disable-lz4 \
>   --with-rsh=/usr/bin/ssh \
>   --with-nobody-user=_rsync \
>   --with-nobody-group=_rsync
> +CONFIGURE_ENV +=CPPFLAGS="-I${LOCALBASE}/include -DXXH_INLINE_ALL=1" \
> + ac_cv_search_XXH64_createState=""
>  
>  .include 
>  
> @@ -41,8 +44,7 @@ CONFIGURE_ARGS +=--enable-md5-asm
>  .endif
>  
>  .if ${FLAVOR:Miconv}
> -CONFIGURE_ENV +=CPPFLAGS='-I${LOCALBASE}/include' \
> - LDFLAGS='-L${LOCALBASE}/lib'
> +CONFIGURE_ENV +=LDFLAGS='-L${LOCALBASE}/lib'
>  LIB_DEPENDS +=   converters/libiconv
>  WANTLIB +=   iconv
>  .endif
> 
> 



rsync: use xxhash

2023-06-05 Thread Stuart Henderson
reminded by the dwz mail, rsync would also like to use xxhash if
available:

'The xxHash library (https://cyan4973.github.io/xxHash/) provides
extremely fast checksum functions that can make the "rsync algorithm"
run much more quickly, especially when matching blocks in large files.
Installing this development library adds xxhash checksums as the default
checksum algorithm. You'll need at least v0.8.0 if you want rsync to
include the full range of its checksum algorithms.'

while xxHash does provide standard shared+static libraries, it is more
commonly used as a "header-only library" (done here and also in dwz)
so there's no additional run dependency in rsync for this.

ok?

Index: Makefile
===
RCS file: /cvs/ports/net/rsync/Makefile,v
retrieving revision 1.97
diff -u -p -r1.97 Makefile
--- Makefile5 Jan 2023 21:59:21 -   1.97
+++ Makefile5 Jun 2023 11:36:48 -
@@ -1,6 +1,7 @@
 COMMENT =  mirroring/synchronization over low bandwidth links
 
 DISTNAME = rsync-3.2.7
+REVISION = 0
 CATEGORIES =   net
 HOMEPAGE = https://rsync.samba.org/
 
@@ -19,12 +20,12 @@ MODULES =   lang/python
 
 MODPY_RUNDEP = No
 
-BUILD_DEPENDS =textproc/py-commonmark${MODPY_FLAVOR}
+BUILD_DEPENDS =textproc/py-commonmark${MODPY_FLAVOR} \
+   sysutils/xxhash
 
 SEPARATE_BUILD =Yes
 CONFIGURE_STYLE =gnu
 CONFIGURE_ARGS =--disable-lz4 \
-   --disable-xxhash \
--disable-zstd \
--with-included-popt \
--with-included-zlib \
@@ -33,6 +34,8 @@ CONFIGURE_ARGS =--disable-lz4 \
--with-rsh=/usr/bin/ssh \
--with-nobody-user=_rsync \
--with-nobody-group=_rsync
+CONFIGURE_ENV +=CPPFLAGS="-I${LOCALBASE}/include -DXXH_INLINE_ALL=1" \
+   ac_cv_search_XXH64_createState=""
 
 .include 
 
@@ -41,8 +44,7 @@ CONFIGURE_ARGS +=--enable-md5-asm
 .endif
 
 .if ${FLAVOR:Miconv}
-CONFIGURE_ENV +=CPPFLAGS='-I${LOCALBASE}/include' \
-   LDFLAGS='-L${LOCALBASE}/lib'
+CONFIGURE_ENV +=LDFLAGS='-L${LOCALBASE}/lib'
 LIB_DEPENDS += converters/libiconv
 WANTLIB += iconv
 .endif