Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
That is the goal, a drop-in optimization. I don't know if xxhash has the required properties to be able to replace the rolling checksum (bytes need to be able to easily be shifted on/off at boths ends, see match.c). However, as there's also talk of replacing the MD5 checksum with xxhash (again,

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Sebastian Andrzej Siewior via rsync
On 2020-05-18 17:55:58 [+0200], Jorrit Jongma via rsync wrote: > I don't disagree that MD5 could (or even should) be replaced so it is > no longer the bottleneck in several real-world cases (including mine). > > However this patch is not for MD5 performance, rather for the rolling > checksum

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Sebastian Andrzej Siewior via rsync
On 2020-05-18 17:06:51 [+0200], Jorrit Jongma via rsync wrote: > diff --git a/checksum.c b/checksum.c > index cd234038..4e696f3d 100644 > --- a/checksum.c > +++ b/checksum.c > @@ -99,6 +99,7 @@ int canonical_checksum(int csum_type) > return csum_type >= CSUM_MD4 ? 1 : 0; > } > > +#ifndef

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Sebastian Andrzej Siewior via rsync
On 2020-05-18 21:55:13 [+0200], Jorrit Jongma wrote: > What do you base this on? So my memory was wrong. SSE2 is supported by all x86-64bit CPUs. Sorry for that. > would imply that SSSE3 is enabled out of the box on builds on machines > that support it, this is not the case (it certainly isn't

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
Unfortunately we can't "always build" the SSSE3 code. It won't even build unless the "-mssse3" flag is presented to GCC. We don't want to build the entire project with this flag enabled, as it might trigger SSSE3 optimizations outside of our runtime decided code path that may break on CPUs that

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That

GPL violation by Synology

2020-05-18 Thread Jorrit Jongma via rsync
Synology ( https://www.synology.com/en-global ) is one of best selling brands of consumer/prosumer/SMB NASes, with revenue estimated to be in the $100M+ range. Several of their NAS backup options use rsync either explicitly or under the hood (NAS-to-Remote-NAS backup, Shared-Folder-Sync), and

[Bug 13082] [REQ] Hardware / SSE based MD5 operations

2020-05-18 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13082 --- Comment #1 from Ben RUBSON --- First info / first patch there : https://lists.samba.org/archive/rsync/2020-May/032175.html -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
I don't disagree that MD5 could (or even should) be replaced so it is no longer the bottleneck in several real-world cases (including mine). However this patch is not for MD5 performance, rather for the rolling checksum rsync uses to match blocks on existing files on both ends to reduce transfer

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
Well, don't get too excited, get_checksum1() (the function optimized here) is not the great performance limiter in this case, it's get_checksum2() and sum_update(), which will be using MD5. You can force using MD4, but on the slower CPU's I've tested in practice that is slower rather than faster,

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Ben RUBSON via rsync
Thank you Jorrit for your detailed answer. > On 18 May 2020, at 17:58, Jorrit Jongma via rsync > wrote: > > Well, don't get too excited, get_checksum1() (the function optimized > here) is not the great performance limiter in this case, it's > get_checksum2() and sum_update(), which will be

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
I think you're missing a point here. Two different checksum algorithms are used in concert, the Adler-based one and the MD5 one. I SSE-optimized the Adler-based one. The Adler-based hash is used to _find_ blocks that might have shifted, while the MD5 hash is a strong cryptographic hash used to

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Ben RUBSON via rsync
> On 18 May 2020, at 19:02, Jorrit Jongma wrote: > > I think you're missing a point here. Two different checksum algorithms > are used in concert, the Adler-based one and the MD5 one. I > SSE-optimized the Adler-based one. The Adler-based hash is used to > _find_ blocks that might have shifted,

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
This drop-in patch increases the performance of the get_checksum1() function on x86-64. On the target slow CPU performance of the function increased by nearly 50% in the x86-64 default SSE2 mode, and by nearly 100% if the compiler was told to enable SSSE3 support. The increase was over 200% on

[SCM] The rsync repository. - branch master updated

2020-05-18 Thread Rsync CVS commit messages
The branch, master has been updated via 3b36bde9 Add back a lost "*" and document the refusing of log-file* opts. from c3986d4c More manpage improvements for "refuse options". https://git.samba.org/?p=rsync.git;a=shortlog;h=master - Log

[SCM] The rsync repository. - branch master updated

2020-05-18 Thread Rsync CVS commit messages
The branch, master has been updated via be7af36c Tweak the accept/refuse strings a bit. from 3b36bde9 Add back a lost "*" and document the refusing of log-file* opts. https://git.samba.org/?p=rsync.git;a=shortlog;h=master - Log