Re: [RFC PATCH] Add SHA1 support

2020-03-17 Thread Dimitrios Apostolou via rsync

On Tuesday, March 17, 2020 9:17:09 PM CET, Sebastian Andrzej Siewior wrote:

On 2020-03-17 00:03:03 [+0100], Dimitrios Apostolou via rsync wrote:

On Thursday, February 20, 2020 10:34:53 PM CET, Sebastian Andrzej Siewior
via rsync wrote:


I'm still not sure if rsync requires a cryptographic hash _or_ if a
strong hash like xxHash64 would be just fine for the job.


I'm fairly sure the hash should *not* be easy to spoof, so I'd say a
cryptographic hash is needed.

As an example, if a file is replaced by a file of the same size and same
hash,
rsync (if -c is in use) will consider the file is the same, and avoid
copying it.


correct. The same goes for currently used md5 which has known collision
attacks. So if you intend to spoo it, you can manufacture the same hash
for two different files for both algorithms. 


This was not the case in 2008 when rsync 3.0.0 came out defaulting to MD5.
I still think you need a cryptographic hash, even though I am not sure
of how strict the requirement is. MD4 was replaced by MD5 in rsync, despite 
MD4

being 2x faster. I would guess it was replaced because of its weaknesses.


Dimitris


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [RFC PATCH] Add SHA1 support

2020-03-17 Thread Sebastian Andrzej Siewior via rsync
On 2020-03-17 00:03:03 [+0100], Dimitrios Apostolou via rsync wrote:
> On Thursday, February 20, 2020 10:34:53 PM CET, Sebastian Andrzej Siewior
> via rsync wrote:
> > 
> > I'm still not sure if rsync requires a cryptographic hash _or_ if a
> > strong hash like xxHash64 would be just fine for the job.
> 
> I'm fairly sure the hash should *not* be easy to spoof, so I'd say a
> cryptographic hash is needed.
> 
> As an example, if a file is replaced by a file of the same size and same
> hash,
> rsync (if -c is in use) will consider the file is the same, and avoid
> copying it.

correct. The same goes for currently used md5 which has known collision
attacks. So if you intend to spoo it, you can manufacture the same hash
for two different files for both algorithms. The question is how likely
it is that this happens by chance. According to [0] xxhash64 scores a
solid 10. It is better than crc32 which has been used a lot as a
checksum for files.

[0] https://github.com/Cyan4973/xxHash

> Dimitris

Sebastian

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [RFC PATCH] Add SHA1 support

2020-03-16 Thread Dimitrios Apostolou via rsync
On Thursday, February 20, 2020 10:34:53 PM CET, Sebastian Andrzej Siewior 
via rsync wrote:


I'm still not sure if rsync requires a cryptographic hash _or_ if a
strong hash like xxHash64 would be just fine for the job.


I'm fairly sure the hash should *not* be easy to spoof, so I'd say a 
cryptographic hash is needed.


As an example, if a file is replaced by a file of the same size and same 
hash,
rsync (if -c is in use) will consider the file is the same, and avoid 
copying it.



Dimitris


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [RFC PATCH] Add SHA1 support

2020-02-20 Thread Sebastian Andrzej Siewior via rsync
On 2020-02-20 20:06:39 [+0100], Markus Ueberall wrote:
> On 2020-02-09 23:19, Sebastian Andrzej Siewior wrote:
> > [...]
> > My primar motivation to use SHA1 for checksumming (by default) instead
> > of MD5 is not the additional security bits but performance. On a decent
> > x86 box the SHA1 performance is almost the same as MD5's but with
> > acceleration it outperforms MD5.
> > 
> > The other alternative would be to go for xxHash64 [0] which has the
> > superior performance but provides a non-cryptographic hash so I though
> > SHA1 would be better here.
> > [...]
> 
> With respect to *both* speed and security, wouldn't BLAKE3 be a better,
> modern alternative if we're looking at checksumming?
> It's "[r]eleased into the public domain with CC0 1.0. Alternatively, it is
> licensed under the Apache License 2.0".  And the performance (see the chart
> at https://github.com/BLAKE3-team/BLAKE3) is *impressive* ...

I'm still not sure if rsync requires a cryptographic hash _or_ if a
strong hash like xxHash64 would be just fine for the job.

> Kind regards, Markus

Sebastian

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [RFC PATCH] Add SHA1 support

2020-02-20 Thread Markus Ueberall via rsync

On 2020-02-09 23:19, Sebastian Andrzej Siewior wrote:

[...]
My primar motivation to use SHA1 for checksumming (by default) instead
of MD5 is not the additional security bits but performance. On a decent
x86 box the SHA1 performance is almost the same as MD5's but with
acceleration it outperforms MD5.

The other alternative would be to go for xxHash64 [0] which has the
superior performance but provides a non-cryptographic hash so I though
SHA1 would be better here.
[...]


With respect to *both* speed and security, wouldn't BLAKE3 be a better,
modern alternative if we're looking at checksumming?
It's "[r]eleased into the public domain with CC0 1.0. Alternatively, it 
is
licensed under the Apache License 2.0".  And the performance (see the 
chart

at https://github.com/BLAKE3-team/BLAKE3) is *impressive* ...

Kind regards, Markus

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[RFC PATCH] Add SHA1 support

2020-02-09 Thread Sebastian Andrzej Siewior via rsync
From: Sebastian Andrzej Siewior 

This is a huge all-in-one patch and deserves a little cleanup and
splitting. However, I wanted to get it out here for some feedback.

My primar motivation to use SHA1 for checksumming (by default) instead
of MD5 is not the additional security bits but performance. On a decent
x86 box the SHA1 performance is almost the same as MD5's but with
acceleration it outperforms MD5.

The other alternative would be to go for xxHash64 [0] which has the
superior performance but provides a non-cryptographic hash so I though
SHA1 would be better here.

For linking against OpenSSL as of today the rsync license would need an
"OpenSSL exception" [1]. The master branch of OpenSSL is licensed under
the Apache License 2.0 so we could wait until 3.0 is released and use
the C version of the algorithm in the meantime.

Here are numbers from a ryzen test box:
small file:
|$ dd if=/dev/zero of=/dev/shm/out bs=1073741824 count=1
|1+0 records in
|1+0 records out
|1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,503252 s, 2,1 GB/s

Old hash:
|$ time ./rsync -c /dev/shm/out --checksum-choice=md4
|-rw-r--r--  1,073,741,824 2020/02/08 16:34:42 out
|
|real0m1,064s
|user0m0,984s
|sys 0m0,080s

MD5 from openssl (should match built-in speed):
|$ time ./rsync -c /dev/shm/out --checksum-choice=md5
|-rw-r--r--  1,073,741,824 2020/02/08 16:34:42 out
|
|real0m1,433s
|user0m1,293s
|sys 0m0,140s

SHA1 from openssl:
|$ time ./rsync -c /dev/shm/out --checksum-choice=sha1
|-rw-r--r--  1,073,741,824 2020/02/08 16:34:42 out
|
|real0m0,619s
|user0m0,524s
|sys 0m0,096s

SHA1 from the built-in code:
|time ./rsync -c /dev/shm/out --checksum-choice=sha1
|-rw-r--r--  1,073,741,824 2020/02/08 16:34:42 out
|
|real0m1,561s
|user0m1,465s
|sys 0m0,096s


[1] 
https://opensource.stackexchange.com/questions/2233/gpl-v3-with-openssl-exception
[0] https://github.com/Cyan4973/xxHash

Signed-off-by: Sebastian Andrzej Siewior 
---
 Makefile.in   |   4 +-
 checksum.c| 144 +
 configure.ac  |   5 +
 lib/md32_common.h | 258 +
 lib/md5.c |  15 +-
 lib/mdigest.h |  77 -
 lib/sha1.c|  19 +++
 lib/sha1.h|  20 +++
 lib/sha_local.h   | 401 ++
 main.c|   2 +
 rsync.h   |   2 +-
 11 files changed, 902 insertions(+), 45 deletions(-)
 create mode 100644 lib/md32_common.h
 create mode 100644 lib/sha1.c
 create mode 100644 lib/sha1.h
 create mode 100644 lib/sha_local.h

diff --git a/Makefile.in b/Makefile.in
index 9bb977eb6b0a8..a390afe4ed829 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -32,7 +32,7 @@ VERSION=@RSYNC_VERSION@
 GENFILES=configure.sh aclocal.m4 config.h.in proto.h proto.h-tstamp rsync.1 
rsyncd.conf.5
 HEADERS=byteorder.h config.h errcode.h proto.h rsync.h ifuncs.h itypes.h 
inums.h \
lib/pool_alloc.h
-LIBOBJ=lib/wildmatch.o lib/compat.o lib/snprintf.o lib/mdfour.o lib/md5.o \
+LIBOBJ=lib/wildmatch.o lib/compat.o lib/snprintf.o lib/mdfour.o lib/md5.o 
lib/sha1.o \
lib/permstring.o lib/pool_alloc.o lib/sysacls.o lib/sysxattrs.o 
@LIBOBJS@
 zlib_OBJS=zlib/deflate.o zlib/inffast.o zlib/inflate.o zlib/inftrees.o \
zlib/trees.o zlib/zutil.o zlib/adler32.o zlib/compress.o zlib/crc32.o
diff --git a/checksum.c b/checksum.c
index 3295252ba0120..77c36b59c93ec 100644
--- a/checksum.c
+++ b/checksum.c
@@ -32,6 +32,7 @@ extern char *checksum_choice;
 #define CSUM_MD4_OLD 3
 #define CSUM_MD4 4
 #define CSUM_MD5 5
+#define CSUM_SHA1 6
 
 int xfersum_type = 0; /* used for the file transfer checksums */
 int checksum_type = 0; /* used for the pre-transfer (--checksum) checksums */
@@ -54,6 +55,8 @@ int parse_csum_name(const char *name, int len)
len = strlen(name);
 
if (!name || (len == 4 && strncasecmp(name, "auto", 4) == 0)) {
+   if (protocol_version >= 31)
+   return CSUM_SHA1;
if (protocol_version >= 30)
return CSUM_MD5;
if (protocol_version >= 27)
@@ -68,6 +71,8 @@ int parse_csum_name(const char *name, int len)
return CSUM_MD5;
if (len == 4 && strncasecmp(name, "none", 4) == 0)
return CSUM_NONE;
+   if (len == 4 && strncasecmp(name, "sha1", 4) == 0)
+   return CSUM_SHA1;
 
rprintf(FERROR, "unknown checksum name: %s\n", name);
exit_cleanup(RERR_UNSUPPORTED);
@@ -88,6 +93,8 @@ int csum_len_for_type(int cst, BOOL flist_csum)
return MD4_DIGEST_LEN;
  case CSUM_MD5:
return MD5_DIGEST_LEN;
+ case CSUM_SHA1:
+   return SHA1_DIGEST_LEN;
  default: /* paranoia to prevent missing case values */
exit_cleanup(RERR_UNSUPPORTED);
}
@@ -121,30 +128,48 @@ uint32 get_checksum1(char *buf1, int32 len)
 return (s1 & 0x) + (s2 << 16);
 }
 
+static void