Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
Unfortunately we can't "always build" the SSSE3 code. It won't even build unless the "-mssse3" flag is presented to GCC. We don't want to build the entire project with this flag enabled, as it might trigger SSSE3 optimizations outside of our runtime decided code path that may break on CPUs that

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Sebastian Andrzej Siewior via rsync
On 2020-05-18 21:55:13 [+0200], Jorrit Jongma wrote: > What do you base this on? So my memory was wrong. SSE2 is supported by all x86-64bit CPUs. Sorry for that. > would imply that SSSE3 is enabled out of the box on builds on machines > that support it, this is not the case (it certainly isn't

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
That is the goal, a drop-in optimization. I don't know if xxhash has the required properties to be able to replace the rolling checksum (bytes need to be able to easily be shifted on/off at boths ends, see match.c). However, as there's also talk of replacing the MD5 checksum with xxhash (again,

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Sebastian Andrzej Siewior via rsync
On 2020-05-18 17:55:58 [+0200], Jorrit Jongma via rsync wrote: > I don't disagree that MD5 could (or even should) be replaced so it is > no longer the bottleneck in several real-world cases (including mine). > > However this patch is not for MD5 performance, rather for the rolling > checksum

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Sebastian Andrzej Siewior via rsync
On 2020-05-18 17:06:51 [+0200], Jorrit Jongma via rsync wrote: > diff --git a/checksum.c b/checksum.c > index cd234038..4e696f3d 100644 > --- a/checksum.c > +++ b/checksum.c > @@ -99,6 +99,7 @@ int canonical_checksum(int csum_type) > return csum_type >= CSUM_MD4 ? 1 : 0; > } > > +#ifndef

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Ben RUBSON via rsync
> On 18 May 2020, at 19:02, Jorrit Jongma wrote: > > I think you're missing a point here. Two different checksum algorithms > are used in concert, the Adler-based one and the MD5 one. I > SSE-optimized the Adler-based one. The Adler-based hash is used to > _find_ blocks that might have shifted,

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
I think you're missing a point here. Two different checksum algorithms are used in concert, the Adler-based one and the MD5 one. I SSE-optimized the Adler-based one. The Adler-based hash is used to _find_ blocks that might have shifted, while the MD5 hash is a strong cryptographic hash used to

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Ben RUBSON via rsync
Thank you Jorrit for your detailed answer. > On 18 May 2020, at 17:58, Jorrit Jongma via rsync > wrote: > > Well, don't get too excited, get_checksum1() (the function optimized > here) is not the great performance limiter in this case, it's > get_checksum2() and sum_update(), which will be

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
Well, don't get too excited, get_checksum1() (the function optimized here) is not the great performance limiter in this case, it's get_checksum2() and sum_update(), which will be using MD5. You can force using MD4, but on the slower CPU's I've tested in practice that is slower rather than faster,

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
I don't disagree that MD5 could (or even should) be replaced so it is no longer the bottleneck in several real-world cases (including mine). However this patch is not for MD5 performance, rather for the rolling checksum rsync uses to match blocks on existing files on both ends to reduce transfer