That is the goal, a drop-in optimization.
I don't know if xxhash has the required properties to be able to
replace the rolling checksum (bytes need to be able to easily be
shifted on/off at boths ends, see match.c).
However, as there's also talk of replacing the MD5 checksum with
xxhash (again,
Unfortunately we can't "always build" the SSSE3 code. It won't even
build unless the "-mssse3" flag is presented to GCC.
We don't want to build the entire project with this flag enabled, as
it might trigger SSSE3 optimizations outside of our runtime decided
code path that may break on CPUs that
What do you base this on?
Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html :
"For the x86-32 compiler, you must use -march=cpu-type, -msse or
-msse2 switches to enable SSE extensions and make this option
effective. For the x86-64 compiler, these extensions are enabled by
default."
That
Synology ( https://www.synology.com/en-global ) is one of best selling
brands of consumer/prosumer/SMB NASes, with revenue estimated to be in
the $100M+ range.
Several of their NAS backup options use rsync either explicitly or
under the hood (NAS-to-Remote-NAS backup, Shared-Folder-Sync), and
ore:
> https://lists.samba.org/archive/rsync/2019-October/031975.html
>
> Cheers,
> Filipe
>
> On Mon, 18 May 2020 at 17:08, Jorrit Jongma via rsync
> wrote:
>>
>> This drop-in patch increases the performance of the get_checksum1()
>> function on x86-64.
>&
, so I am looking into this.
On Mon, May 18, 2020 at 5:18 PM Ben RUBSON via rsync
wrote:
>
> On 18 May 2020, at 17:06, Jorrit Jongma via rsync
> wrote:
>
> This drop-in patch increases the performance of the get_checksum1()
> function on x86-64.
>
>
> As ref, rather
:
>
> Thank you Jorrit for your detailed answer.
>
> > On 18 May 2020, at 17:58, Jorrit Jongma via rsync
> > wrote:
> >
> > Well, don't get too excited, get_checksum1() (the function optimized
> > here) is not the great performance limiter in this case, it's
>
This drop-in patch increases the performance of the get_checksum1()
function on x86-64.
On the target slow CPU performance of the function increased by nearly
50% in the x86-64 default SSE2 mode, and by nearly 100% if the
compiler was told to enable SSSE3 support. The increase was over 200%
on
I've read up some more on the subject, and it seems the proper way to
do this with GCC is g++ and target attributes. I've refactored the
patch that way, and it indeed uses SSSE3 automatically on supporting
CPUs, regardless of the build host, so this should be ideal both for
home builders and
This is great! However, do you have access to a big-endian CPU? I'm
not sure how relevant this still is but I've read at some point that
xxhash might have produced different (reverse?) hashes on different
endian CPUs. It may be prudent to acutally test if that is the case
with this implementation
Here's the third (and final, barring bugs) version, which builds _on
top_ of the patch already committed by Wayne.
This version also adds AVX2 support and rearranges defines and
filenames in a way that seems more logical and future-proof to me.
Real-world tests shows about an 8% network transfer
When a whole-file checksum is performed, hashing was done in 64 byte
blocks, causing overhead and limiting performance.
Testing showed the performance improvement to go up quickly going from
64 to 512 bytes, with diminishing returns above, 4096 was where it
seemed to plateau for me. Re-used
No, this patch is for the whole-file checksum, the resulting checksum
is the same regardless of the block size used when feeding the hash
algorithm.
On Mon, May 25, 2020 at 10:57 AM Pierre Bernhardt via rsync
wrote:
> Will this not to produce more false negative results? This mean if
> comparing
This patch introduces an optimized assembler version of md5_process(),
the inner loop of MD5 checksumming. It affects the performance of all
MD5 operations in rsync - including block matching and whole-file
checksums.
Performance gain is 5-10% depending on the specific CPU.
Originally created by
I haven't found a way to control GCC's target selector at runtime
(though in theory it could be possible), so switching between specific
optimizations (SSE2 vs SSSE3) may prove difficult or require
additional duplication of code. Bypassing the optimizations completely
in an all-or-nothing way
15 matches
Mail list logo