Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
Unfortunately we can't "always build" the SSSE3 code. It won't even
build unless the "-mssse3" flag is presented to GCC.

We don't want to build the entire project with this flag enabled, as
it might trigger SSSE3 optimizations outside of our runtime decided
code path that may break on CPUs that do not support it.

A suggestion found online was isolating the SSSE3 version in
"checksum_ssse3.c" and compiling only that file with "-mssse3", but
some searching around has led me to reports from developers who had
even that setup cause issues with code shared between SSE and non-SSE
objects. I think that risk is low for this case though, as we're just
doing some math and not passing anything but integers and pointers. I
wouldn't think twice about enabling it that way on one of my pet
projects, but a project as widespread as rsync should not have that in
its codebase unless we're _absolutely_ sure it doesn't cause problems
for _anybody_. A very small risk of issues times many millions of
users equals guaranteed failure.

But even if we use that method it requires modifications to the build
scripts (check for x86-64 and exclude otherwise, present file-specific
flags) that are beyond my experience with this build setup.

> My suggestion would be to have a get_checksum1_sse2() and
> get_checksum1_sse3() and always build them. The compiler should support
> it. Then on runtime you would check for sse3 and based on the result
> get_checksum1() would either invoke the _sse2() or sse3().
>
> Without auto detection it won't be utilized by distros. But yes, this
> could be improved afterwards.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Sebastian Andrzej Siewior via rsync
On 2020-05-18 21:55:13 [+0200], Jorrit Jongma wrote:
> What do you base this on?

So my memory was wrong. SSE2 is supported by all x86-64bit CPUs. Sorry
for that.

> would imply that SSSE3 is enabled out of the box on builds on machines
> that support it, this is not the case (it certainly isn't on my Ubuntu
> box). It would be preferred to detect this at runtime but getting that
> to work on GCC is (apparently) a mess, and would probably require
> modifications to configure/Makefile/etc that I'm not comfortable
> doing, as my lack of expertise on those would probably lead me to
> break the build for somebody else. If someone knowledgable enough in
> that area wants to fix it, though...

My suggestion would be to have a get_checksum1_sse2() and
get_checksum1_sse3() and always build them. The compiler should support
it. Then on runtime you would check for sse3 and based on the result
get_checksum1() would either invoke the _sse2() or sse3().

Without auto detection it won't be utilized by distros. But yes, this
could be improved afterwards.

Sebastian

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


GPL violation by Synology

2020-05-18 Thread Jorrit Jongma via rsync
Synology ( https://www.synology.com/en-global ) is one of best selling
brands of consumer/prosumer/SMB NASes, with revenue estimated to be in
the $100M+ range.

Several of their NAS backup options use rsync either explicitly or
under the hood (NAS-to-Remote-NAS backup, Shared-Folder-Sync), and
they run the rsync daemon (under File Services). rsync is an integral
part of their solution.

They use a modified build of rsync (seemingly 3.0.9 + cherrypicked CVE
fixes) for which they do provide some sources (see
https://sourceforge.net/projects/dsgpl/files/Synology%20NAS%20GPL%20Source/24922branch/x64-source/
and search for "rsync"), and if you spend your afternoon tracking down
all the needed packages/downloads and reconcile the missing headers,
it builds "something".

However, this "something" is not the rsync they ship on the boxes or
provide in their updates. They add their own authentication code to
rsync as well as add several options. This is all fine by itself, but
you can't properly compile their code. Both the modifications to rsync
as well as their own library headers are littered with #ifdef's, but
they've scrubbed all of them and replaced them with names such as
MY_ABC_HERE and MY_DEF_HERE. As all defines are replaced by the same
values, you get nested #ifdef's referring to the same value, and your
options become to turn them all off (which doesn't work out), turn
them all on (which doesn't work out), or manually wrestle through
every line of code and try to figure out what's what.

This does not satisfy releasing the "complete corresponding source
code" as required per the GPL and this prevents users from keeping
rsync on their Synology NASes updated or to push their own
modifications.

License violation aside, Synology's rsync build is almost twice as slow
as a master branch build, I'm not the only one who would love to
update it. But replacing their build with a normal rsync build breaks
all manner of things. As Synology will soon release a new version of
their OS (DSM 7) which will leave many of their older NAS models
without further updates, getting them to release proper sources for
their GPL components is their users' only hope for updates.

I ask (urge, beg) anyone here who has authored any code on the rsync
master branch <= v3.0.9 or patches that fixed a CVE afterward (it is
hard to figure out which commits they copied, though) to complain to
Synology and demand a proper release of the rsync sources they use.

I've found that licence violations for Samba are being acted upon by
the Software Freedom Conservancy, but it was unclear to me whether
that includes rsync or not, hence my message here.

Thanks!

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
That is the goal, a drop-in optimization.

I don't know if xxhash has the required properties to be able to
replace the rolling checksum (bytes need to be able to easily be
shifted on/off at boths ends, see match.c).

However, as there's also talk of replacing the MD5 checksum with
xxhash (again, the rolling checksum isn't the MD5 checksum, they're
two different checksums used at different times for different
reasons), and that would lead to a much larger performance benefit
than replacing the rolling checksum, I suggest we keep xxhash over
there. xxhash isn't a cryptographic checksum, and if we replace MD5
with xxhash we still have two different checksums being verified when
blocks are replaced, which should increase their total strength
(decrease the odds of a collision).

You could argue xxhash is fast enough to replace both checksums even
if the rolling checksum has to be recalculated for every shifted byte,
but that would require extensive testing and some study into how to
refactor the double checksum into a single one while maintaining
backward compatibility (could be easy, could be hard).

> Still. You claim in your patch that
>
> | Benchmarks   C   SSE2SSSE3
> | - Intel i7-7700hq1850 MB/s   2550 MB/s   4050 MB/s
>
> while xxhash [0] claims on a Core i5-3340M @2.7GHz that:
>
> |VersionSpeed on 64-bit Speed on 32-bit
> |XXH64  13.8 GB/s   1.9 GB/s
>
> so using xxhash64 for that work would also boost !x86 platforms.
>
> However your patch has the benefit that no changes are required on the
> remote side. I like that.
>
> [0] https://github.com/Cyan4973/xxHash#benchmarks

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
What do you base this on?

Per https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html :

"For the x86-32 compiler, you must use -march=cpu-type, -msse or
-msse2 switches to enable SSE extensions and make this option
effective. For the x86-64 compiler, these extensions are enabled by
default."

That reads to me like we're fine for SSE2. As stated in my comments,
SSSE3 support must be manually enabled at build time. Your comment
would imply that SSSE3 is enabled out of the box on builds on machines
that support it, this is not the case (it certainly isn't on my Ubuntu
box). It would be preferred to detect this at runtime but getting that
to work on GCC is (apparently) a mess, and would probably require
modifications to configure/Makefile/etc that I'm not comfortable
doing, as my lack of expertise on those would probably lead me to
break the build for somebody else. If someone knowledgable enough in
that area wants to fix it, though...

The only reason there's an SSE2 backport (you'll find SSSE3 support on
most CPUs up to nearly a decade old) in the first place is because by
my understanding SSE2 is supported on all x86-64 CPUs out of the box.

> You can't replace the code like that with SSE2+. You need runtime
> detection for this. Otherwise it can't be enabled by distros becuase it
> would fail on CPUs without SSE2+. Only SSE is part of generic x86-64.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Sebastian Andrzej Siewior via rsync
On 2020-05-18 17:55:58 [+0200], Jorrit Jongma via rsync wrote:
> I don't disagree that MD5 could (or even should) be replaced so it is
> no longer the bottleneck in several real-world cases (including mine).
> 
> However this patch is not for MD5 performance, rather for the rolling
> checksum rsync uses to match blocks on existing files on both ends to
> reduce transfer size.

Still. You claim in your patch that

| Benchmarks   C   SSE2SSSE3
| - Intel i7-7700hq1850 MB/s   2550 MB/s   4050 MB/s

while xxhash [0] claims on a Core i5-3340M @2.7GHz that:

|VersionSpeed on 64-bit Speed on 32-bit
|XXH64  13.8 GB/s   1.9 GB/s

so using xxhash64 for that work would also boost !x86 platforms.

However your patch has the benefit that no changes are required on the
remote side. I like that.

[0] https://github.com/Cyan4973/xxHash#benchmarks

Sebastian

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Sebastian Andrzej Siewior via rsync
On 2020-05-18 17:06:51 [+0200], Jorrit Jongma via rsync wrote:
> diff --git a/checksum.c b/checksum.c
> index cd234038..4e696f3d 100644
> --- a/checksum.c
> +++ b/checksum.c
> @@ -99,6 +99,7 @@ int canonical_checksum(int csum_type)
>   return csum_type >= CSUM_MD4 ? 1 : 0;
>  }
> 
> +#ifndef __SSE2__  // see checksum_sse2.c for SSE2/SSSE3 version
>  /*
>a simple 32 bit checksum that can be updated from either end
>(inspired by Mark Adler's Adler-32 checksum)
> @@ -119,6 +120,7 @@ uint32 get_checksum1(char *buf1, int32 len)
>   }
>   return (s1 & 0x) + (s2 << 16);
>  }
> +#endif

You can't replace the code like that with SSE2+. You need runtime
detection for this. Otherwise it can't be enabled by distros becuase it
would fail on CPUs without SSE2+. Only SSE is part of generic x86-64.

Sebastian

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Ben RUBSON via rsync
> On 18 May 2020, at 19:02, Jorrit Jongma  wrote:
> 
> I think you're missing a point here. Two different checksum algorithms
> are used in concert, the Adler-based one and the MD5 one. I
> SSE-optimized the Adler-based one. The Adler-based hash is used to
> _find_ blocks that might have shifted, while the MD5 hash is a strong
> cryptographic hash used to _verify_ blocks and files. You wouldn't
> want to replace the MD5 hash with the Adler-based hash, they are of a
> different class. If you'd replace the MD5 hash with a different one,
> you'd replace it with one of the SHA's or even xxHash.

Jorrit, I missed that point yes, sorry, thank you for clarifying again...

We would then also need a SSE-version of the MD5 algorithm to have a full 
hardware / SSE support.
But then, as you said before ; "single stream MD5 cannot be effectively 
optimized with SSE, at least I've not seen an SSE version faster than pure C".
So, finally, https://bugzilla.samba.org/show_bug.cgi?id=13082 may not be 
achievable easily, at least it would not improve performance...

Replacing MD5 with a different algorithm would impact both sender and receiver, 
but yes we may then use a faster (even perhaps hardware-backed) solution.

Ben


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
I think you're missing a point here. Two different checksum algorithms
are used in concert, the Adler-based one and the MD5 one. I
SSE-optimized the Adler-based one. The Adler-based hash is used to
_find_ blocks that might have shifted, while the MD5 hash is a strong
cryptographic hash used to _verify_ blocks and files. You wouldn't
want to replace the MD5 hash with the Adler-based hash, they are of a
different class. If you'd replace the MD5 hash with a different one,
you'd replace it with one of the SHA's or even xxHash.

On Mon, May 18, 2020 at 6:21 PM Ben RUBSON via rsync
 wrote:
>
> Thank you Jorrit for your detailed answer.
>
> > On 18 May 2020, at 17:58, Jorrit Jongma via rsync  
> > wrote:
> >
> > Well, don't get too excited, get_checksum1() (the function optimized
> > here) is not the great performance limiter in this case, it's
> > get_checksum2() and sum_update(), which will be using MD5.
>
> Certainly that all other functions using MD5 could be updated to use your 
> SSE-optimized function.
> So that we have a full SSE MD5 support, wherever rsync is using it (basis 
> file checksum, rolling checksum etc...).
>
> I think one nice performance improvement could be when the receiver checksums 
> the (big/huge) basis file, because here the sender is then simply waiting...
>
> > Unfortunately, single stream MD5 cannot be effectively optimized with
> > SSE, at least I've not seen an SSE version faster than pure C
>
> I was about to tell you that we successfully implemented it into FreeBSD a 
> few years ago, but it's CRC32, not MD5...
> https://github.com/freebsd/freebsd/commit/c4b27423f57c30068aff3f234c912ae8d9ff1b6a
> https://github.com/freebsd/freebsd/commit/5a798b035b4858923878c014a5faa48b2f9aa6e7
> At least sounds like the algorithm author / inspiration, Mark Adler, is the 
> same :)
>
> Anyway, this is a first interesting SSE MD5 support.
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Ben RUBSON via rsync
Thank you Jorrit for your detailed answer.

> On 18 May 2020, at 17:58, Jorrit Jongma via rsync  
> wrote:
> 
> Well, don't get too excited, get_checksum1() (the function optimized
> here) is not the great performance limiter in this case, it's
> get_checksum2() and sum_update(), which will be using MD5.

Certainly that all other functions using MD5 could be updated to use your 
SSE-optimized function.
So that we have a full SSE MD5 support, wherever rsync is using it (basis file 
checksum, rolling checksum etc...).

I think one nice performance improvement could be when the receiver checksums 
the (big/huge) basis file, because here the sender is then simply waiting...

> Unfortunately, single stream MD5 cannot be effectively optimized with
> SSE, at least I've not seen an SSE version faster than pure C

I was about to tell you that we successfully implemented it into FreeBSD a few 
years ago, but it's CRC32, not MD5...
https://github.com/freebsd/freebsd/commit/c4b27423f57c30068aff3f234c912ae8d9ff1b6a
https://github.com/freebsd/freebsd/commit/5a798b035b4858923878c014a5faa48b2f9aa6e7
At least sounds like the algorithm author / inspiration, Mark Adler, is the 
same :)

Anyway, this is a first interesting SSE MD5 support.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
Well, don't get too excited, get_checksum1() (the function optimized
here) is not the great performance limiter in this case, it's
get_checksum2() and sum_update(), which will be using MD5. You can
force using MD4, but on the slower CPU's I've tested in practice that
is slower rather than faster, contrary to what would be expected.
While this patch will improve things a little, to improve things a lot
we need to tackle or replace MD5.

Unfortunately, single stream MD5 cannot be effectively optimized with
SSE, at least I've not seen an SSE version faster than pure C, and
I've looked into it. What we _can_ do is parallelize multiple streams
using SSE, which may double to triple the throughput at the same CPU
load under ideal circumstances. However, this cannot be applied to
rsync as-is as it doesn't process multiple files simultaneously (and
it is questionable if that is something we should even want). The
single-file stream could still be parallelized this way but it would
require a slight change in checksum generation that would in turn
require a protocol change - both ends need to support it. At that
point we might as well swap MD5 out completely, though I will still be
digging deeper into this case.

The good news is that this parallelization _is_ possible in a drop-in
fashion for the case where rsync is comparing the chunks on both ends,
the same case where the get_checksum1() patch shows its benefits. I
estimate performance improvements could reach about 30% for that
specific case (re-transferring large yet slightly modified files), but
that does nothing for the performance of whole file checksumming or
the transfer of new files. Depending on your use-case you may never or
rarely even see that performance improvement in action. It applies for
my use-case though, so I am looking into this.

On Mon, May 18, 2020 at 5:18 PM Ben RUBSON via rsync
 wrote:
>
> On 18 May 2020, at 17:06, Jorrit Jongma via rsync  
> wrote:
>
> This drop-in patch increases the performance of the get_checksum1()
> function on x86-64.
>
>
> As ref, rather related to this : 
> https://bugzilla.samba.org/show_bug.cgi?id=13082
>
> Thank you Jorrit !
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
I don't disagree that MD5 could (or even should) be replaced so it is
no longer the bottleneck in several real-world cases (including mine).

However this patch is not for MD5 performance, rather for the rolling
checksum rsync uses to match blocks on existing files on both ends to
reduce transfer size.

On Mon, May 18, 2020 at 5:44 PM Filipe Maia via rsync
 wrote:
>
> I think this is a great patch but, in my view, an even better way to tackle 
> the fundamental problem (the performance limitations) is to use a much faster 
> checksum like xxhash, as has been suggested before:
> https://lists.samba.org/archive/rsync/2019-October/031975.html
>
> Cheers,
> Filipe
>
> On Mon, 18 May 2020 at 17:08, Jorrit Jongma via rsync  
> wrote:
>>
>> This drop-in patch increases the performance of the get_checksum1()
>> function on x86-64.
>>
>> On the target slow CPU performance of the function increased by nearly
>> 50% in the x86-64 default SSE2 mode, and by nearly 100% if the
>> compiler was told to enable SSSE3 support. The increase was over 200%
>> on the fastest CPU tested in SSSE3 mode.
>>
>> Transfer time improvement with large files existing on both ends but
>> with some bits flipped was measured as 5-10%, with the target machine
>> being CPU limited (still so due to MD5).
>>
>> This same patch on (my) GitHub for easier reading:
>> https://github.com/Chainfire/rsync/commit/f5d0b32df869a23a74b8b8295e4983b0943866df
>>
>>
>> From f5d0b32df869a23a74b8b8295e4983b0943866df Mon Sep 17 00:00:00 2001
>> From: Jorrit Jongma 
>> Date: Mon, 18 May 2020 00:21:39 +0200
>> Subject: [PATCH 1/1] SSE2/SSSE3 optimized version of get_checksum1() for
>>  x86-64
>>
>> ---
>>  Makefile.in |   2 +-
>>  checksum.c  |   2 +
>>  checksum_sse2.c | 243 
>>  3 files changed, 246 insertions(+), 1 deletion(-)
>>  create mode 100644 checksum_sse2.c
>>
>> diff --git a/Makefile.in b/Makefile.in
>> index 59649562..e4202336 100644
>> --- a/Makefile.in
>> +++ b/Makefile.in
>> @@ -40,7 +40,7 @@ OBJS1=flist.o rsync.o generator.o receiver.o
>> cleanup.o sender.o exclude.o \
>>   util.o util2.o main.o checksum.o match.o syscall.o log.o backup.o delete.o
>>  OBJS2=options.o io.o compat.o hlink.o token.o uidlist.o socket.o 
>> hashtable.o \
>>   fileio.o batch.o clientname.o chmod.o acls.o xattrs.o
>> -OBJS3=progress.o pipe.o
>> +OBJS3=progress.o pipe.o checksum_sse2.o
>>  DAEMON_OBJ = params.o loadparm.o clientserver.o access.o connection.o
>> authenticate.o
>>  popt_OBJS=popt/findme.o  popt/popt.o  popt/poptconfig.o \
>>   popt/popthelp.o popt/poptparse.o
>> diff --git a/checksum.c b/checksum.c
>> index cd234038..4e696f3d 100644
>> --- a/checksum.c
>> +++ b/checksum.c
>> @@ -99,6 +99,7 @@ int canonical_checksum(int csum_type)
>>   return csum_type >= CSUM_MD4 ? 1 : 0;
>>  }
>>
>> +#ifndef __SSE2__  // see checksum_sse2.c for SSE2/SSSE3 version
>>  /*
>>a simple 32 bit checksum that can be updated from either end
>>(inspired by Mark Adler's Adler-32 checksum)
>> @@ -119,6 +120,7 @@ uint32 get_checksum1(char *buf1, int32 len)
>>   }
>>   return (s1 & 0x) + (s2 << 16);
>>  }
>> +#endif
>>
>>  void get_checksum2(char *buf, int32 len, char *sum)
>>  {
>> diff --git a/checksum_sse2.c b/checksum_sse2.c
>> new file mode 100644
>> index ..51662833
>> --- /dev/null
>> +++ b/checksum_sse2.c
>> @@ -0,0 +1,243 @@
>> +/*
>> + * SSE2/SSSE3-optimized routines to support checksumming of bytes.
>> + *
>> + * Copyright (C) 1996 Andrew Tridgell
>> + * Copyright (C) 1996 Paul Mackerras
>> + * Copyright (C) 2004-2020 Wayne Davison
>> + * Copyright (C) 2020 Jorrit Jongma
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 3 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, visit the http://fsf.org website.
>> + */
>> +/*
>> + * Optimization target for get_checksum1 was the Intel Atom D2700, the
>> + * slowest CPU in the test set and the most likely to be CPU limited during
>> + * transfers. The combination of intrinsics was chosen specifically for the
>> + * most gain on that CPU, other combinations were occasionally slightly
>> + * faster on the others.
>> + *
>> + * While on more modern CPUs transfers are less likely to be CPU limited,
>> + * lower CPU usage is always better. Improvements may still be seen when
>> + * matching chunks from NVMe storage even on newer CPUs.
>> + *
>> + * Benchmarks   C   SSE2SSSE3
>> + * 

[Bug 13082] [REQ] Hardware / SSE based MD5 operations

2020-05-18 Thread just subscribed for rsync-qa from bugzilla via rsync
https://bugzilla.samba.org/show_bug.cgi?id=13082

--- Comment #1 from Ben RUBSON  ---
First info / first patch there :
https://lists.samba.org/archive/rsync/2020-May/032175.html

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Jorrit Jongma via rsync
This drop-in patch increases the performance of the get_checksum1()
function on x86-64.

On the target slow CPU performance of the function increased by nearly
50% in the x86-64 default SSE2 mode, and by nearly 100% if the
compiler was told to enable SSSE3 support. The increase was over 200%
on the fastest CPU tested in SSSE3 mode.

Transfer time improvement with large files existing on both ends but
with some bits flipped was measured as 5-10%, with the target machine
being CPU limited (still so due to MD5).

This same patch on (my) GitHub for easier reading:
https://github.com/Chainfire/rsync/commit/f5d0b32df869a23a74b8b8295e4983b0943866df


>From f5d0b32df869a23a74b8b8295e4983b0943866df Mon Sep 17 00:00:00 2001
From: Jorrit Jongma 
Date: Mon, 18 May 2020 00:21:39 +0200
Subject: [PATCH 1/1] SSE2/SSSE3 optimized version of get_checksum1() for
 x86-64

---
 Makefile.in |   2 +-
 checksum.c  |   2 +
 checksum_sse2.c | 243 
 3 files changed, 246 insertions(+), 1 deletion(-)
 create mode 100644 checksum_sse2.c

diff --git a/Makefile.in b/Makefile.in
index 59649562..e4202336 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -40,7 +40,7 @@ OBJS1=flist.o rsync.o generator.o receiver.o
cleanup.o sender.o exclude.o \
  util.o util2.o main.o checksum.o match.o syscall.o log.o backup.o delete.o
 OBJS2=options.o io.o compat.o hlink.o token.o uidlist.o socket.o hashtable.o \
  fileio.o batch.o clientname.o chmod.o acls.o xattrs.o
-OBJS3=progress.o pipe.o
+OBJS3=progress.o pipe.o checksum_sse2.o
 DAEMON_OBJ = params.o loadparm.o clientserver.o access.o connection.o
authenticate.o
 popt_OBJS=popt/findme.o  popt/popt.o  popt/poptconfig.o \
  popt/popthelp.o popt/poptparse.o
diff --git a/checksum.c b/checksum.c
index cd234038..4e696f3d 100644
--- a/checksum.c
+++ b/checksum.c
@@ -99,6 +99,7 @@ int canonical_checksum(int csum_type)
  return csum_type >= CSUM_MD4 ? 1 : 0;
 }

+#ifndef __SSE2__  // see checksum_sse2.c for SSE2/SSSE3 version
 /*
   a simple 32 bit checksum that can be updated from either end
   (inspired by Mark Adler's Adler-32 checksum)
@@ -119,6 +120,7 @@ uint32 get_checksum1(char *buf1, int32 len)
  }
  return (s1 & 0x) + (s2 << 16);
 }
+#endif

 void get_checksum2(char *buf, int32 len, char *sum)
 {
diff --git a/checksum_sse2.c b/checksum_sse2.c
new file mode 100644
index ..51662833
--- /dev/null
+++ b/checksum_sse2.c
@@ -0,0 +1,243 @@
+/*
+ * SSE2/SSSE3-optimized routines to support checksumming of bytes.
+ *
+ * Copyright (C) 1996 Andrew Tridgell
+ * Copyright (C) 1996 Paul Mackerras
+ * Copyright (C) 2004-2020 Wayne Davison
+ * Copyright (C) 2020 Jorrit Jongma
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, visit the http://fsf.org website.
+ */
+/*
+ * Optimization target for get_checksum1 was the Intel Atom D2700, the
+ * slowest CPU in the test set and the most likely to be CPU limited during
+ * transfers. The combination of intrinsics was chosen specifically for the
+ * most gain on that CPU, other combinations were occasionally slightly
+ * faster on the others.
+ *
+ * While on more modern CPUs transfers are less likely to be CPU limited,
+ * lower CPU usage is always better. Improvements may still be seen when
+ * matching chunks from NVMe storage even on newer CPUs.
+ *
+ * Benchmarks   C   SSE2SSSE3
+ * - Intel Atom D2700   550 MB/s750 MB/s1000 MB/s
+ * - Intel i7-7700hq1850 MB/s   2550 MB/s   4050 MB/s
+ * - AMD ThreadRipper 2950x 2900 MB/s   5600 MB/s   8950 MB/s
+ *
+ * This optimization for get_checksum1 is intentionally limited to x86-64 as
+ * no 32-bit CPU was available for testing. As 32-bit CPUs only have half the
+ * available xmm registers, this optimized version may not be faster than the
+ * pure C version anyway.
+ *
+ * GCC automatically enables SSE2 support on x86-64 builds. The SSSE3 code
+ * path must be enabled manually: ./configure CFLAGS="-mssse3 -O2"
+ */
+
+#ifdef __x86_64__
+#ifdef __SSE2__
+
+#include "rsync.h"
+
+#ifdef __SSSE3__
+#include 
+#else
+#include 
+#endif
+
+/* Compatibility functions to let our SSSE3 algorithm run on SSE2 */
+
+static inline __m128i sse_load_si128(void const* buf) {
+#ifdef __SSSE3__
+return _mm_lddqu_si128(buf);  // same as loadu on all but the
oldest SSSE3 CPUs
+#else
+return _mm_loadu_si128(buf);
+#endif
+}
+
+#ifndef __SSSE3__

[SCM] The rsync repository. - branch master updated

2020-05-18 Thread Rsync CVS commit messages
The branch, master has been updated
   via  be7af36c Tweak the accept/refuse strings a bit.
  from  3b36bde9 Add back a lost "*" and document the refusing of log-file* 
opts.

https://git.samba.org/?p=rsync.git;a=shortlog;h=master


- Log -
commit be7af36c517757d7ff9562275ebfc04355613dff
Author: Wayne Davison 
Date:   Mon May 18 00:03:09 2020 -0700

Tweak the accept/refuse strings a bit.

---

Summary of changes:
 options.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)


Changeset truncated at 500 lines:

diff --git a/options.c b/options.c
index 32871fd5..740c1271 100644
--- a/options.c
+++ b/options.c
@@ -1161,10 +1161,10 @@ static void parse_one_refuse_match(int negated, const 
char *ref, const struct po
*shortName = op->shortName;
if ((op->longName && wildmatch(ref, op->longName))
 || (*shortName && wildmatch(ref, shortName))) {
-   if (*op->descrip == 'a' || *op->descrip == 'r')
-   op->descrip = negated ? "accepted" : "refused";
+   if (op->descrip[1] == '*')
+   op->descrip = negated ? "a*" : "r*";
else if (!is_wild)
-   op->descrip = negated ? "ACCEPTED" : "REFUSED";
+   op->descrip = negated ? "a=" : "r=";
found_match = 1;
if (!is_wild)
break;
@@ -1194,29 +1194,27 @@ static void set_refuse_options(void)
 
/* We abuse the descrip field in poptOption to make it easy to flag 
which options
 * are refused (since we don't use it otherwise).  Start by marking all 
options
-* as accepted except for some that are marked as ACCEPTED 
(non-wild-matched). */
+* as "a"ccepted with a few options also marked as non-wild. */
for (op = long_options; ; op++) {
const char *longName = op->longName ? op->longName : "";
if (!op->longName && !op->shortName) {
list_end = op;
break;
}
-   /* These options are protected from wild-card matching, but the 
user is free to
-* shoot themselves in the foot if they specify the option 
explicitly. */
-   if (op->shortName == 'e'
+   if (op->shortName == 'e' /* Required for compatibility flags */
 || op->shortName == '0' /* --from0 just modifies --files-from, 
so refuse that instead (or not) */
 || op->shortName == 's' /* --protect-args is always OK */
 || op->shortName == 'n' /* --dry-run is always OK */
-|| strcmp("server", longName) == 0
-|| strcmp("sender", longName) == 0
 || strcmp("iconv", longName) == 0
 || strcmp("no-iconv", longName) == 0
 || strcmp("checksum-seed", longName) == 0
 || strcmp("write-devices", longName) == 0 /* disable 
wild-match (it gets refused below) */
-|| strcmp("log-format", longName) == 0)
-   op->descrip = "ACCEPTED";
+|| strcmp("log-format", longName) == 0 /* aka out-format (NOT 
log-file-format) */
+|| strcmp("sender", longName) == 0
+|| strcmp("server", longName) == 0)
+   op->descrip = "a="; /* exact-match only */
else
-   op->descrip = "accepted";
+   op->descrip = "a*"; /* wild-card-able */
}
assert(list_end != NULL);
 
@@ -1249,7 +1247,7 @@ static void set_refuse_options(void)
 
/* Now we use the descrip values to actually mark the options for 
refusal. */
for (op = long_options; op != list_end; op++) {
-   int refused = *op->descrip == 'r' || *op->descrip == 'R';
+   int refused = op->descrip[0] == 'r';
op->descrip = NULL;
if (!refused)
continue;


-- 
The rsync repository.

___
rsync-cvs mailing list
rsync-cvs@lists.samba.org
https://lists.samba.org/mailman/listinfo/rsync-cvs


[SCM] The rsync repository. - branch master updated

2020-05-18 Thread Rsync CVS commit messages
The branch, master has been updated
   via  3b36bde9 Add back a lost "*" and document the refusing of log-file* 
opts.
  from  c3986d4c More manpage improvements for "refuse options".

https://git.samba.org/?p=rsync.git;a=shortlog;h=master


- Log -
commit 3b36bde9536614737bb73eaac5aa1e9c039472a3
Author: Wayne Davison 
Date:   Sun May 17 23:26:13 2020 -0700

Add back a lost "*" and document the refusing of log-file* opts.

---

Summary of changes:
 options.c  | 10 +-
 rsyncd.conf.yo |  3 ++-
 2 files changed, 7 insertions(+), 6 deletions(-)


Changeset truncated at 500 lines:

diff --git a/options.c b/options.c
index ebf31a60..32871fd5 100644
--- a/options.c
+++ b/options.c
@@ -1143,7 +1143,7 @@ void option_error(void)
 }
 
 
-static void set_one_refuse_option(int negated, const char *ref, const struct 
poptOption *list_end)
+static void parse_one_refuse_match(int negated, const char *ref, const struct 
poptOption *list_end)
 {
struct poptOption *op;
char shortName[2];
@@ -1221,7 +1221,7 @@ static void set_refuse_options(void)
assert(list_end != NULL);
 
if (am_daemon) /* Refused by default, but can be accepted via 
"!write-devices" */
-   set_one_refuse_option(0, "write-devices", list_end);
+   parse_one_refuse_match(0, "write-devices", list_end);
 
while (1) {
while (*ref == ' ') ref++;
@@ -1232,7 +1232,7 @@ static void set_refuse_options(void)
negated = *ref == '!';
if (negated && ref[1])
ref++;
-   set_one_refuse_option(negated, ref, list_end);
+   parse_one_refuse_match(negated, ref, list_end);
if (!cp)
break;
*cp = ' ';
@@ -1242,9 +1242,9 @@ static void set_refuse_options(void)
if (am_daemon) {
 #ifdef ICONV_OPTION
if (!*lp_charset(module_id))
-   set_one_refuse_option(0, "iconv", list_end);
+   parse_one_refuse_match(0, "iconv", list_end);
 #endif
-   set_one_refuse_option(0, "log-file", list_end);
+   parse_one_refuse_match(0, "log-file*", list_end);
}
 
/* Now we use the descrip values to actually mark the options for 
refusal. */
diff --git a/rsyncd.conf.yo b/rsyncd.conf.yo
index 15b1b44c..08d198a1 100644
--- a/rsyncd.conf.yo
+++ b/rsyncd.conf.yo
@@ -801,7 +801,8 @@ If you are un-refusing the compress option, you probably 
want to match
 "!compress*" so that you also accept the bf(--compress-level) option.
 
 Note that the "write-devices" option is refused by default, but can be
-explicitly accepted with "!write-devices".
+explicitly accepted with "!write-devices".  The options "log-file" and
+"log-file-format" are forcibly refused and cannot be accepted.
 
 Here are all the options that are not matched by wild-cards:
 


-- 
The rsync repository.

___
rsync-cvs mailing list
rsync-cvs@lists.samba.org
https://lists.samba.org/mailman/listinfo/rsync-cvs