Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
That is the goal, a drop-in optimization. I don't know if xxhash has the required properties to be able to replace the rolling checksum (bytes need to be able to easily be shifted on/off at boths ends, see match.c). However, as there's also talk of replacing the MD5 checksum with xxhash (again,

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
Unfortunately we can't "always build" the SSSE3 code. It won't even build unless the "-mssse3" flag is presented to GCC. We don't want to build the entire project with this flag enabled, as it might trigger SSSE3 optimizations outside of our runtime decided code path that may break on CPUs that

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
What do you base this on? Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html : "For the x86-32 compiler, you must use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default." That

GPL violation by Synology

2020-05-18 Thread Jorrit Jongma via rsync
Synology ( https://www.synology.com/en-global ) is one of best selling brands of consumer/prosumer/SMB NASes, with revenue estimated to be in the $100M+ range. Several of their NAS backup options use rsync either explicitly or under the hood (NAS-to-Remote-NAS backup, Shared-Folder-Sync), and

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
ore: > https://lists.samba.org/archive/rsync/2019-October/031975.html > > Cheers, > Filipe > > On Mon, 18 May 2020 at 17:08, Jorrit Jongma via rsync > wrote: >> >> This drop-in patch increases the performance of the get_checksum1() >> function on x86-64. >&

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
, so I am looking into this. On Mon, May 18, 2020 at 5:18 PM Ben RUBSON via rsync wrote: > > On 18 May 2020, at 17:06, Jorrit Jongma via rsync > wrote: > > This drop-in patch increases the performance of the get_checksum1() > function on x86-64. > > > As ref, rather

Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
: > > Thank you Jorrit for your detailed answer. > > > On 18 May 2020, at 17:58, Jorrit Jongma via rsync > > wrote: > > > > Well, don't get too excited, get_checksum1() (the function optimized > > here) is not the great performance limiter in this case, it's >

[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
This drop-in patch increases the performance of the get_checksum1() function on x86-64. On the target slow CPU performance of the function increased by nearly 50% in the x86-64 default SSE2 mode, and by nearly 100% if the compiler was told to enable SSSE3 support. The increase was over 200% on

Re: [PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-19 Thread Jorrit Jongma via rsync
I've read up some more on the subject, and it seems the proper way to do this with GCC is g++ and target attributes. I've refactored the patch that way, and it indeed uses SSSE3 automatically on supporting CPUs, regardless of the build host, so this should be ideal both for home builders and

Re: checksum feature request

2020-05-23 Thread Jorrit Jongma via rsync
This is great! However, do you have access to a big-endian CPU? I'm not sure how relevant this still is but I've read at some point that xxhash might have produced different (reverse?) hashes on different endian CPUs. It may be prudent to acutally test if that is the case with this implementation

Re: [PATCHv3] SSE2/SSSE3/AVX2 optimized version of get_checksum1() for x86-64

2020-05-22 Thread Jorrit Jongma via rsync
Here's the third (and final, barring bugs) version, which builds _on top_ of the patch already committed by Wayne. This version also adds AVX2 support and rearranges defines and filenames in a way that seems more logical and future-proof to me. Real-world tests shows about an 8% network transfer

[PATCH] file_checksum() optimization

2020-05-24 Thread Jorrit Jongma via rsync
When a whole-file checksum is performed, hashing was done in 64 byte blocks, causing overhead and limiting performance. Testing showed the performance improvement to go up quickly going from 64 to 512 bytes, with diminishing returns above, 4096 was where it seemed to plateau for me. Re-used

Re: [PATCH] file_checksum() optimization

2020-05-25 Thread Jorrit Jongma via rsync
No, this patch is for the whole-file checksum, the resulting checksum is the same regardless of the block size used when feeding the hash algorithm. On Mon, May 25, 2020 at 10:57 AM Pierre Bernhardt via rsync wrote: > Will this not to produce more false negative results? This mean if > comparing

[PATCH] Optimized assembler version of md5_process() for x86-64

2020-05-22 Thread Jorrit Jongma via rsync
This patch introduces an optimized assembler version of md5_process(), the inner loop of MD5 checksumming. It affects the performance of all MD5 operations in rsync - including block matching and whole-file checksums. Performance gain is 5-10% depending on the specific CPU. Originally created by

Re: [PATCHv2] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-20 Thread Jorrit Jongma via rsync
I haven't found a way to control GCC's target selector at runtime (though in theory it could be possible), so switching between specific optimizations (SSE2 vs SSSE3) may prove difficult or require additional duplication of code. Bypassing the optimizations completely in an all-or-nothing way