Re: Rsync 3.2.3pre1 released

2020-07-29 Thread Ben RUBSON via rsync
> On 28 Jul 2020, at 20:53, Nelson H. F. Beebe via rsync 
>  wrote:
> 
> To my surprise, ALL of the builds failed, and examination of the build
> logs showed they were all due to missing libraries or header files,
> notably for one or more of lz4, openssl, xxhash, and zstd.  Once I
> installed those packages, I got successful builds.

This is mainly to "force you" using these librairies, which bring very nice 
speedup / CPU saving.
This is also to be sure packages maintainers from the various systems will use 
them, so that these improvements are quickly adopted.

> I believe that it would be much better to simply disable the code that
> needs the missing library or its header files, and keep on running the
> configure script, with a prominent final report, something like:
> 
>   WARNING: configuration is complete, but some features are
>   missing because libraries and/or header files were not
>   found, or were too old, for these packages:
> 
>   lz4 xxhash

The risk is that some (many ?) would no care at all and continue their builds, 
without then the benefit from these libraries.
The worst risk being to have packages distributed without these features 
enabled.

> FreeBSD 11.4 and 12:
>pkg install cmark liblz4 openssl py37-cmarkgfm py37-xxhash zstd
> FreeBSD 13:
>pkg install cmark cmarkgfm liblz4 openssl py37-xxhash zstd

Not sure you need py37-xxhash, xxhash shold be enough.

Ben


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Ben RUBSON via rsync
> On 18 May 2020, at 19:02, Jorrit Jongma  wrote:
> 
> I think you're missing a point here. Two different checksum algorithms
> are used in concert, the Adler-based one and the MD5 one. I
> SSE-optimized the Adler-based one. The Adler-based hash is used to
> _find_ blocks that might have shifted, while the MD5 hash is a strong
> cryptographic hash used to _verify_ blocks and files. You wouldn't
> want to replace the MD5 hash with the Adler-based hash, they are of a
> different class. If you'd replace the MD5 hash with a different one,
> you'd replace it with one of the SHA's or even xxHash.

Jorrit, I missed that point yes, sorry, thank you for clarifying again...

We would then also need a SSE-version of the MD5 algorithm to have a full 
hardware / SSE support.
But then, as you said before ; "single stream MD5 cannot be effectively 
optimized with SSE, at least I've not seen an SSE version faster than pure C".
So, finally, https://bugzilla.samba.org/show_bug.cgi?id=13082 may not be 
achievable easily, at least it would not improve performance...

Replacing MD5 with a different algorithm would impact both sender and receiver, 
but yes we may then use a faster (even perhaps hardware-backed) solution.

Ben


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: [PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

2020-05-18 Thread Ben RUBSON via rsync
Thank you Jorrit for your detailed answer.

> On 18 May 2020, at 17:58, Jorrit Jongma via rsync  
> wrote:
> 
> Well, don't get too excited, get_checksum1() (the function optimized
> here) is not the great performance limiter in this case, it's
> get_checksum2() and sum_update(), which will be using MD5.

Certainly that all other functions using MD5 could be updated to use your 
SSE-optimized function.
So that we have a full SSE MD5 support, wherever rsync is using it (basis file 
checksum, rolling checksum etc...).

I think one nice performance improvement could be when the receiver checksums 
the (big/huge) basis file, because here the sender is then simply waiting...

> Unfortunately, single stream MD5 cannot be effectively optimized with
> SSE, at least I've not seen an SSE version faster than pure C

I was about to tell you that we successfully implemented it into FreeBSD a few 
years ago, but it's CRC32, not MD5...
https://github.com/freebsd/freebsd/commit/c4b27423f57c30068aff3f234c912ae8d9ff1b6a
https://github.com/freebsd/freebsd/commit/5a798b035b4858923878c014a5faa48b2f9aa6e7
At least sounds like the algorithm author / inspiration, Mark Adler, is the 
same :)

Anyway, this is a first interesting SSE MD5 support.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Enabling easier contributions to rsync

2020-04-26 Thread Ben RUBSON via rsync
> On 26 Apr 2020, at 20:37, Filipe Maia via rsync  wrote:
> 
> Hi,
> 
> Are there plans to soon move to github or some other place where people can 
> easily contribute to rsync, making the software discussion more lively and 
> productive? 
> I've seen several useful patches being submitted (e.g. faster checksums) 
> which many others could really use...

+1 for GitHub, would really be convenient !

Wayne, time to switch ? :-)

Ben


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: --link-dest. Time to 'building file list' incrementing

2019-01-08 Thread Ben RUBSON via rsync
Hi,

As you are on Cygwin, you should consider the notexec & noacl mount options :
https://cygwin.com/cygwin-ug-net/using.html#mount-table 


They impact stat() performance.

Ben

> On 8 Jan 2019, at 10:56, John Simpson via rsync  wrote:
> 
> Any ideas anyone?
> 
> I still need at least a weekly backup of all data.
> 
> The current workaround is just for the most active directories.
> 
> Are there any diagnostics I can do which might shed some light on this?
> 
> Thanks
> 
> John
> 
>> On 4 Jan 2019 09:53, John Simpson via rsync  wrote:
>> 
>> Kevin
>> 
>> The link-dest parameter is a single directory (the previous day's 
>> directory), the destination is today's directory.
>> 
>> I haven't tried deleting a backup,  there's no particular need in space 
>> terms,  at the current rate there's enough space for several years of daily 
>> backups.
>> 
>> I've reverted to daily backups on a small subset of the total; the full 
>> backup now takes around 30 hours.  Clearly not practical.
>> 
>> As the small subset takes only a few minutes to complete I can't yet see if 
>> this time is incrementing too.
>> 
>> John   On 3 Jan 2019 17:06, Kevin Korb via rsync  
>> wrote:
>>> 
>>> It does normally take some time to analyze large trees of files.  It has 
>>> to call stat() on each file to get the size and timestamp. 
>>> 
>>> However, 15 hours seems a bit excessive even though I have never tried 
>>> to do this on Windows or a NAS system.  Just to be clear, is your 
>>> --link-dest parameter a single directory or are you trying to tell it to 
>>> use all of the previous backups? 
>>> 
>>> Also, have you deleted a backup yet?  In my experience that takes a lot 
>>> longer than running one so if you need 15 hours to run a backup I would 
>>> expect deleting one to take about a week. 
>>> 
>>> On 1/3/19 4:23 AM, John Simpson via rsync wrote: 
 
 
 I've been running rsync as a cygwin task on Windows Server 2008 for about 
 two months now. I'm using the --link-dest option to do a daily 'snapshot' 
 of the contents of a server containing about 10TB of data, about 13 
 million files, to a Linux based NAS server. Things started out great but I 
 soon noticed that the time take to complete was slowly incrementing. It 
 started at around three hours, but is now around fifteen. 
 
 The command is as follows... 
 
 rsync -rlptDhPR \ 
  --password-file=password \ 
  --Chmod=Du=rwx,Dgo=rx,Fu=rw,Fgo=r \ 
  --Stats \ 
  --delete \ 
  --log-file=logfilename \ 
  --link-dest=linkdestpath \ 
  sourceDirectory \ 
  rsync@192.168.1.2::destinationDirectory 
 
 I'm not using the full -a option as the differences between the Windows 
 and Linux ownership stuff messed things up. 
 
 The first log file looked like this... 
 
 2018/10/01 23:00:14 [2164] building file list 
 ...transfer file list here 
 2018/10/02 02:11:30 [2164] Number of files: 13,759,998 (reg: 12,260,176, 
 dir: 1,499,821, link: 1) 
 2018/10/02 02:11:30 [2164] Number of created files: 302 (reg: 291, dir: 
 11) 
 2018/10/02 02:11:30 [2164] Number of regular files transferred: 310 
 2018/10/02 02:11:30 [2164] Total file size: 10.40T bytes 
 2018/10/02 02:11:30 [2164] Total transferred file size: 664.31K bytes 
 2018/10/02 02:11:30 [2164] Literal data: 277.91K bytes 
 2018/10/02 02:11:30 [2164] Matched data: 386.40K bytes 
 2018/10/02 02:11:30 [2164] File list size: 10.42M 
 2018/10/02 02:11:30 [2164] File list generation time: 0.154 seconds 
 2018/10/02 02:11:30 [2164] File list transfer time: 0.000 seconds 
 2018/10/02 02:11:30 [2164] Total bytes sent: 235.68M 
 2018/10/02 02:11:30 [2164] Total bytes received: 7.51M 
 2018/10/02 02:11:30 [2164] sent 235.68M bytes  received 7.51M bytes  
 21.17K bytes/sec 
 2018/10/02 02:11:30 [2164] total size is 10.40T  speedup is 42,753.79 
 
 the most recent looks like this... 
 
 2018/11/24 23:00:15 [2924] building file list 
 2018/11/24 23:00:17 [2924] cd..t.. /cygdrive/ 
 2018/11/25 13:21:16 [2924] Number of files: 13,776,423 (reg: 12,274,642, 
 dir: 1,501,780, link: 1) 
 2018/11/25 13:21:16 [2924] Number of created files: 0 
 2018/11/25 13:21:16 [2924] Number of regular files transferred: 0 
 2018/11/25 13:21:16 [2924] Total file size: 10.49T bytes 
 2018/11/25 13:21:16 [2924] Total transferred file size: 0 bytes 
 2018/11/25 13:21:16 [2924] Literal data: 0 bytes 
 2018/11/25 13:21:16 [2924] Matched data: 0 bytes 
 2018/11/25 13:21:16 [2924] File list size: 10.35M 
 2018/11/25 13:21:16 [2924] File list generation time: 0.316 seconds 
 2018/11/25 13:21:16 [2924] File list transfer time: 0.000 seconds 
 2018/11/25 13:21:16 [2924] Total bytes sent: 236.55M 
 2018/11/25 13:21:16 [2924] Total bytes 

Re: rsync of a reflink from OCFS2

2018-03-14 Thread Ben RUBSON via rsync
Yes you're right, rsync would update only a few parts of the file, but  
network usage would be even worst.

The only solution would finally be to have rsync on the target system.

Ben

On 14 Mar, Kevin Korb via rsync wrote:


--no-whole-file would only make it even worse.  It would have to read
the remote file over the network in order to do the diff then it would
write the whole file over the network anyway (--inplace would help a
little).  Local copies force --whole-file for a good reason.

On 03/14/2018 10:05 AM, Ben RUBSON via rsync wrote:

On 14 Mar 2018, Lentes, Bernd via rsync wrote:


- On Mar 14, 2018, at 2:19 PM, Ben RUBSON ben.rub...@gmail.com wrote:


On 14 Mar 2018, Lentes, Bernd via rsync wrote:


I would now expect a rsync from the snap would transfer just some megay
bytes to the file from the day before.
But it doesn't:

ha-idg-1:/cluster/guests/servers_alive # time rsync -av --stats
sa.raw.snap /mnt/idg-2/SysAdmin_AG_Wurst/backup/cluster/test


Hi Bernd,

When doing rsync locally, diff alg is not involved, this is why file is
fully transferred.

Ben


Hi Ben,

also when the target is a cifs share, it's still considered as local ?


Yes as it's mounted locally.


Is there something i can do to get the diff algorithm used ?


Perhaps --no-whole-file would do the trick ?


Copying via ssh to the cifs server is unfortunately not possible.


Bernd


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync of a reflink from OCFS2

2018-03-14 Thread Ben RUBSON via rsync

On 14 Mar 2018, Lentes, Bernd via rsync wrote:


- On Mar 14, 2018, at 2:19 PM, Ben RUBSON ben.rub...@gmail.com wrote:


On 14 Mar 2018, Lentes, Bernd via rsync wrote:


I would now expect a rsync from the snap would transfer just some megay
bytes to the file from the day before.
But it doesn't:

ha-idg-1:/cluster/guests/servers_alive # time rsync -av --stats
sa.raw.snap /mnt/idg-2/SysAdmin_AG_Wurst/backup/cluster/test


Hi Bernd,

When doing rsync locally, diff alg is not involved, this is why file is
fully transferred.

Ben


Hi Ben,

also when the target is a cifs share, it's still considered as local ?


Yes as it's mounted locally.


Is there something i can do to get the diff algorithm used ?


Perhaps --no-whole-file would do the trick ?


Copying via ssh to the cifs server is unfortunately not possible.


Bernd


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync of a reflink from OCFS2

2018-03-14 Thread Ben RUBSON via rsync

On 14 Mar 2018, Lentes, Bernd via rsync wrote:

I would now expect a rsync from the snap would transfer just some megay  
bytes to the file from the day before.

But it doesn't:

ha-idg-1:/cluster/guests/servers_alive # time rsync -av --stats  
sa.raw.snap /mnt/idg-2/SysAdmin_AG_Wurst/backup/cluster/test


Hi Bernd,

When doing rsync locally, diff alg is not involved, this is why file is  
fully transferred.


Ben


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Rsync and client-side encryption of files

2018-01-10 Thread Ben RUBSON via rsync

On 11 Jan 2018 03:29, H via rsync wrote:

Is anyone using client-side encryption of files before transferring them  
to a cloud server using Rsync? I am running CentOS.--


A solution could be EncFS in reverse mode.

Ben

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync does hours of "fake-work" after failure

2017-10-06 Thread Ben RUBSON via rsync
> On 06 Oct 2017, at 13:18, Frank Steiner  wrote:
> 
> Ben RUBSON wrote
>> 
>> I encountered same issue and proposed the following patch :
>> https://bugzilla.samba.org/show_bug.cgi?id=12525
> 
> Sorry, I really missed that as my search keywords were completely
> different :-)
> 
>> Perhaps you should give it a try ?
> 
> Yes, that works fine! Thanks a lot! I'll use my own patched rsync
> until these patches are accepted (given how old they are, is rsync
> unmaintained at the moment?).

You're welcome !

Wayne is committing regularly :
https://git.samba.org/rsync.git/?p=rsync.git;a=summary
I've written some patches and bug fixes I hope to see in 3.1.3 :)

Ben


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync does hours of "fake-work" after failure

2017-10-06 Thread Ben RUBSON via rsync
> On 06 Oct 2017, at 12:24, Frank Steiner via rsync  
> wrote:
> 
> Hi,
> 
> I just stepped on a strange and very annoying bug in rsync-3.1.0 as
> shipped with SuSE Linux Enterprise 12, but verified the bug also
> with rsync-HEAD-20170123.
> 
> I tried to copy some of my movie collection to a usb disk that our
> TV could read, so it was formatted with vfat. I forgot that vfat can't
> handle files > 4 GB, and some of the movies were larger.
> 
> rsync worked for 3 hours copying hundreds of GB, but after it had
> finished the last file it complained
> 
> rsync: write failed on "/media/disk/some_movie.mpg": File too large (27)
> rsync error: error in file IO (code 11) at receiver.c(389) [receiver=3.1.0]
> 
> This file had been the third in the list of files to copy, and when
> I looked at the usb disk I saw that the two files before and 4 GB of
> some_movie.mpg had been copied. But the 400 GB of the remaining files
> had not! rsync had claimed to copy each of it, and as I use "-avP" 
> I had indeed been watching the progress. The speed and the MB/s
> were the usual values for copying to the USB disk.
> 
> So rsync doesn't stop and fail at the point it sees the first file
> too large for vfat, it just goes on and "fakes" the rest of the
> process :-) And because it took some hours, this was a real bad
> surprise at the end. 
> 
> Below is the output of a little test that can easily be reprocuded.
> Is this a known bug? I couldn't find something similar in bugzilla 
> or the mailinglist archives.

Hi,

I encountered same issue and proposed the following patch :
https://bugzilla.samba.org/show_bug.cgi?id=12525

Perhaps you should give it a try ?

Ben


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [Bug 12819] [PATCH] sync() on receiving side for data consistency

2017-06-16 Thread Ben RUBSON via rsync

> On 15 Jun 2017, at 19:29, Karl O. Pinc via rsync  
> wrote:
> 
> On Thu, 15 Jun 2017 13:23:44 +
> just subscribed for rsync-qa from bugzilla via rsync
>  wrote:
> 
>> https://bugzilla.samba.org/show_bug.cgi?id=12819
>> 
>> --- Comment #7 from Ben RUBSON  ---
> 
>> Note that my patch simply adds a sync() just after recv_files(), so
>> one sync() per connection, not per write operation.
> 
>> But we could make this a rsync option, so that one can enable /
>> disable it on its own.
> 
> I think the "right" rsync option to add (because rsync does
> not have enough options already ;-) is a --hook-post option.
> It would run something (a `sync` in your case) on the
> remote end after finishing.  There are clear security issues
> here.
> 
> Rather than having --hook-post and having to do something
> (a server side config option that says what --hook-post
> can do?) to address the security concerns it seems much
> simpler to improve the rsync documentation regarding running
> the rsync server side.

--daemon (if used) already has post-xfer option, but as explained in
the bug report, could be hard to use when daemon is chrooted.

> I'm still using command="rsync --server --daemon ." in my
> ~/.ssh/authorized_keys file on the remote end.  It'd be simple 
> enough to add, say, a "sync" to the end of this to force a sync
> when rsync finishes.

It would however sync() even if the client only read files.

> The problem is that the --server (and, especially,
> --daemon) documentation has gone away.  Or at least
> left the man page. (v3.1.1, Debian 8, Jessie)  Except
> for a hint that --server exists at the bottom.

Are you looking for `man rsyncd.conf` ?

> If the server side of rsync was better documented then
> perhaps a simple inetd rsync service (or --rsync-path
> or -e value, etc.) would be easy for the end-user to 
> cobble together to meet needs such as this.
> 
> Can somebody please explain --server?  (And --sender, I guess.)
> I might (possibly) be motivated to send in a man page patch.
> 
> Regards,
> 
> Karl 

Thank you for your feedback Karl !

Ben


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html