Re: rsync local performance

2009-11-13 Thread Greg Siekas
Has anyone compiled rsync with other newer compilers like Intel 11.1?  Does 
this break anything?  

My quick test shows rsync-3.1.0 performance jumps to ~120MB/sec.  

Greg


On Nov 13, 2009, at 10:44 AM, Greg Siekas wrote:

> Wayne,
> 
> Transferring an 8gb file using rsync between a network (10GbE) mounted 
> filesystem and local disk.  
> 
> rsync-2.6.9 - 88-95 MB/sec
> rsync-3.0.6 - 62-72 MB/sec
> rsync-3.1.0 - 86-90 MB/sec
> 
> Doing a cp of the file yields 140-160MB/sec.
> 
> It appears the IO code improvements in 3.1 have definitely made a difference 
> over  the 3.0 code base.
> 
> Greg
> 
> 
> -- 
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: max file size

2009-11-13 Thread Heinz-Josef Claes
On Fri, 13 Nov 2009 13:33:08 -0500
Matt McCutchen  wrote:

> On Fri, 2009-11-13 at 12:36 +0100, Heinz-Josef Claes wrote:
> > On Fri, 13 Nov 2009 01:38:48 -0500
> > Matt McCutchen  wrote:
> > > On Mon, 2009-11-09 at 18:20 +0100, Heinz-Josef Claes wrote:
> > > > I want to check if the following is possible:
> > > > 
> > > > 1. transport a big block of data (several terabytes) physically from 
> > > > location 
> > > > A to location B (very long distance) via tapes (or disks).
> > > > (Location A and B use different storage technologies.)
> > > > 
> > > > When the tapes arrive in location B, the block of data has changed in 
> > > > location 
> > > > A (a program / OS is running and storing data in it).
> > > > 
> > > > 2. shutdown application / OS in location A, rsync the delta between 
> > > > Location A 
> > > > and B online, then restart the system in location B.
> > > > 
> > > > (Perhaps step 2 has to be done multiple times.)
> > > 
> > > Since the source and destination versions are practically certain to
> > > differ, --checksum would serve no purpose.  See the man page description
> > > of --checksum.
> > 
> > Don't understand what you mean. From 1. und 2., only a few percent of
> > the data will change, so the idea is to transfer the differences only.
> > Transferring the whole file online takes too long.
> > How to do this without check sums (either --checksum or --inbound)?
> 
> Did you read the description of --checksum as I suggested?  It is an
> alternative "quick check" for deciding whether a file needs to be
> transferred, which is not what you want.  You're talking about the
> delta-transfer algorithm, which is on by default for remote runs and is
> controlled by a separate option, --(no-)whole-file.
> 
You're right - sorry misunderstanding from my side.
--no-whole-file --out-format='%n%L (%b of %l)'
does the job.
Thanks, HJC
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Tutorial:How to install and configure cwRsync on a Windows Platform

2009-11-13 Thread Alan C. Bonnici
Hi,

At http://www.alanbonnici.com/videos/cwrsync.asp I have created a video
tutorial on how to install and configure rsync running on a windows
platform.

It is a works-in-progress document. If you have any corrections or comments
please email me.

Regards,
Alan Bonnici
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

rsync local performance

2009-11-13 Thread Greg Siekas
Wayne,

Transferring an 8gb file using rsync between a network (10GbE) mounted 
filesystem and local disk.  

rsync-2.6.9 - 88-95 MB/sec
rsync-3.0.6 - 62-72 MB/sec
rsync-3.1.0 - 86-90 MB/sec

Doing a cp of the file yields 140-160MB/sec.

It appears the IO code improvements in 3.1 have definitely made a difference 
over  the 3.0 code base.

Greg


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: max file size

2009-11-13 Thread Matt McCutchen
On Fri, 2009-11-13 at 12:36 +0100, Heinz-Josef Claes wrote:
> On Fri, 13 Nov 2009 01:38:48 -0500
> Matt McCutchen  wrote:
> > On Mon, 2009-11-09 at 18:20 +0100, Heinz-Josef Claes wrote:
> > > I want to check if the following is possible:
> > > 
> > > 1. transport a big block of data (several terabytes) physically from 
> > > location 
> > > A to location B (very long distance) via tapes (or disks).
> > > (Location A and B use different storage technologies.)
> > > 
> > > When the tapes arrive in location B, the block of data has changed in 
> > > location 
> > > A (a program / OS is running and storing data in it).
> > > 
> > > 2. shutdown application / OS in location A, rsync the delta between 
> > > Location A 
> > > and B online, then restart the system in location B.
> > > 
> > > (Perhaps step 2 has to be done multiple times.)
> > 
> > Since the source and destination versions are practically certain to
> > differ, --checksum would serve no purpose.  See the man page description
> > of --checksum.
> 
> Don't understand what you mean. From 1. und 2., only a few percent of
> the data will change, so the idea is to transfer the differences only.
> Transferring the whole file online takes too long.
> How to do this without check sums (either --checksum or --inbound)?

Did you read the description of --checksum as I suggested?  It is an
alternative "quick check" for deciding whether a file needs to be
transferred, which is not what you want.  You're talking about the
delta-transfer algorithm, which is on by default for remote runs and is
controlled by a separate option, --(no-)whole-file.

-- 
Matt

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Stability of rsync 3.1.0dev (Re: Does files-from work with --delete?)

2009-11-13 Thread Matt McCutchen
On Fri, 2009-11-13 at 08:38 -0800, Wayne Davison wrote:
> On Thu, Nov 12, 2009 at 3:56 PM, Philip Pokorny
>  wrote:
> How confident are you in the current state of 3.1.0.pre and
> the nightly snapshots?  Should I be concerned about running
> this on production data?
> 
> Personally, I'm almost ready to start using it in production.  The
> 3.1.0dev code prior to the I/O changes was in production-ready shape,
> and the I/O overhaul has been testing well so far (I use it for all
> the personal rsyncing needs).  Any failure cases from the I/O code
> should result in the stopping of the transfer, not some kind of
> corruption, and I haven't found anything in quite a while.

3.1.0dev needs to win back my trust after a bad experience trying to do
a backup a few weeks ago.  First I got an out-of-space error that rsync
didn't report:

http://lists.samba.org/archive/rsync/2009-November/024135.html

And once I freed up disk space, I got an "unexpected tag 50" (I think),
which I held off on reporting pending a reproducible case.  I haven't
had a chance to try another backup with 3.1.0dev yet.

-- 
Matt

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Does files-from work with --delete?

2009-11-13 Thread Matt McCutchen
On Fri, 2009-11-13 at 08:43 -0800, Wayne Davison wrote:
> On Thu, Nov 12, 2009 at 6:07 PM, Matt McCutchen
>  wrote:
> The best approach for now is probably to backport the
> --delete-missing-args changes to 3.0.6.
>  
> In the future, I'd suggest starting with the head of the b3.0.x
> branch.  That currently gets you one extra commit, an xattr-related
> memory fix (2daed024b17a2cafb956e12581c25119d07a5950).

Sure, provided that you keep b3.0.x stable all the time rather than just
at releases.  I've redone the missing-args backport:

https://mattmccutchen.net/rsync/rsync.git/?a=shortlog;h=hacks/missing-args-b3.0.x

-- 
Matt

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: --fuzzy search over to-be-deleted files to catch moved files and directories

2009-11-13 Thread H. Langos
Hi Matt,

Thank you very much for answering those questions and helping me to
understand rsync better!

On Thu, Nov 12, 2009 at 11:20:19PM -0500, Matt McCutchen wrote:
> Attempting to address each of your questions, here and then in your
> other message...
> 
> On Wed, 2009-11-11 at 12:17 +0100, H. Langos wrote: 
> > > It will find moved files that match exactly
> > > according to the "quick check" in effect (size + mtime or checksum). 
> > 
> > That is basename+size+mtime  or basename+checksum, right?
> 
> No, a basename match is not a requirement (hence the ability to detect
> renames), but it is a tie-breaker. 

Ahh, ok, so here size+mtime or checksum select the base file. 

And if that selection fails then "--fuzzy" search is applied but looks 
only in the /dst directory for a suitable candidate.

(Or is the temporal order reversed?)

> > How does "--detect-renamed" interact with "--fuzzy" and "--delete-after"? 
> 
> --detect-renamed and --fuzzy are two different means of finding basis
> files that overlap in some cases but do not really interact.
> --detect-renamed considers the whole destination using the quick check,
> while --fuzzy considers only the same destination subdir using
> size+mtime or otherwise name similarity.
> 
> --delete-before and --delete-during may reduce the effectiveness of
> --fuzzy, as stated in the man page description of --fuzzy, but they do
> not affect --detect-renamed since --detect-renamed actually works during
> the delete pass.
...
> > > It doesn't calculate name similarity like --fuzzy because that would
> > > be prohibitively expensive in the current implementation.
> > Only files of the same size should be
> > candidates to start with, right?
> 
> No, the name similarity calculation I'm talking about is the fallback to
> select a similar basis file when no available destination file passes
> the quick check, so it does not require a size match.

Hmm, ok so fuzzy also finds files that are slightly different and have their
name slightly changed.

This sounds like it would be a good idea to (have the option to) include 
the delete candidates directory .~tmp~ (or whatever else "--detect-renamed" 
uses) included in the --fuzzy search.

The real world applications are obvious. Apart from software packages 
as described in https://bugzilla.samba.org/show_bug.cgi?id=3392#c7 
(thanks for tha link!), which is aspecial case, using rsync friendly
gzip/zlib compression, there is the large area of media files.

Example:
For my photo collections it would speed things up in the case where I
move pictures to a different directory, rename them from DSC_01234.JPG to
20091113-174354_dsc01234.jpg (extracted timestamp from exif data) and 
add author, license and some keywords to the exif tags.

This is not theory. In fact I do just those things with a script when
importing pictures from any of my cameras into the photo archive. I 
rename them as shown above and then I move them to a directory structure
made of /// . I don't change the exif tags yet, which 
I wanted to add in the future. 
But that would make the  size+mtime/checksum test fail. Using "--fuzzy" 
would help, but only if I'd do an rsync between the moving operation 
and the tag changing operation. 

No matter which operation I'd do first, but doing both together would 
mean completely new transfer to my backup location. :-/


Same thing goes for mp3 collections when you finally find the time 
to tag your new music and move it to the right directory in your 
collection.
 
> > Why would it be so expensive?
> 
> Wayne said so here:
> 
> https://bugzilla.samba.org/show_bug.cgi?id=3392#c11

Well, I think I'll have to wait then ... or refrain from doing move
and change operations at the same time. :-)

Thank you very much for your help!

cheers
-henrik
 
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Does files-from work with --delete?

2009-11-13 Thread Wayne Davison
On Thu, Nov 12, 2009 at 6:07 PM, Matt McCutchen wrote:

> The best approach for now is probably to backport the --delete-missing-args
> changes to 3.0.6.


In the future, I'd suggest starting with the head of the b3.0.x branch.
That currently gets you one extra commit, an xattr-related memory fix
(2daed024b17a2cafb956e12581c25119d07a5950).

..wayne..
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Does files-from work with --delete?

2009-11-13 Thread Wayne Davison
On Thu, Nov 12, 2009 at 3:56 PM, Philip Pokorny <
ppoko...@penguincomputing.com> wrote:

>  How confident are you in the current state of 3.1.0.pre and the nightly
> snapshots?  Should I be concerned about running this on production data?
>

Personally, I'm almost ready to start using it in production.  The 3.1.0dev
code prior to the I/O changes was in production-ready shape, and the I/O
overhaul has been testing well so far (I use it for all the personal
rsyncing needs).  Any failure cases from the I/O code should result in the
stopping of the transfer, not some kind of corruption, and I haven't found
anything in quite a while.

..wayne..
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: max file size

2009-11-13 Thread Heinz-Josef Claes
On Fri, 13 Nov 2009 01:38:48 -0500
Matt McCutchen  wrote:

> On Mon, 2009-11-09 at 18:20 +0100, Heinz-Josef Claes wrote:
> > Am Montag, 9. November 2009 17:48:35 schrieb Matt McCutchen:
> > > On Mon, 2009-11-09 at 11:43 +0100, Heinz-Josef Claes wrote:
> > > > does anybody know what's the maximum file size (terabytes?) when using
> > > > rsync with options --checksum and / or --inplace?
> > > >
> > > > What file sizes have been tested in reality? Are there any experiences
> > > > using rsync (with --checksum and / or --inplace) for big files with
> > > > several / dozens or terabytes?
> > > 
> > > I don't believe rsync has a fixed maximum size other than "what can fit
> > > in 64 bits", but I can't speak to any reliability issues that might come
> > > up with extremely large files.
> > > 
> > I've read about a fix for overrun checksum buffers with more than some 
> > hundred 
> > terabytes but that was just something undefined . . .
> 
> Indeed, I forgot about that.  The delta-transfer algorithm doesn't work
> for files longer than 2^31 blocks.  With the default maximum block size
> of 2^17, the limit is 2^48 bytes or 256 TB.  You could stretch the limit
> by fixing a larger block size with --block-size .  See:
> 
> https://bugzilla.samba.org/show_bug.cgi?id=5459#c2

Thanks for that information!

Do you (or anybody) every has done a test with big file sizes?

> 
> > > For what purpose are you considering --checksum?  In the case where the
> > > file's size hasn't changed (probably true for large image files), it
> > > will add an extra full read of the file on both sides before the
> > > transfer begins, which would be very expensive for multi-terabyte files.
> > 
> > I want to check if the following is possible:
> > 
> > 1. transport a big block of data (several terabytes) physically from 
> > location 
> > A to location B (very long distance) via tapes (or disks).
> > (Location A and B use different storage technologies.)
> > 
> > When the tapes arrive in location B, the block of data has changed in 
> > location 
> > A (a program / OS is running and storing data in it).
> > 
> > 2. shutdown application / OS in location A, rsync the delta between 
> > Location A 
> > and B online, then restart the system in location B.
> > 
> > (Perhaps step 2 has to be done multiple times.)
> 
> Since the source and destination versions are practically certain to
> differ, --checksum would serve no purpose.  See the man page description
> of --checksum.
> 

Don't understand what you mean. From 1. und 2., only a few percent of the data 
will change, so the idea is to transfer the differences only. Transferring the 
whole file online takes too long.
How to do this without check sums (either --checksum or --inbound)?

I'll probably be able to make a test with a file size of some terabytes in the 
next weeks, but that's not guaranteed.

Regards, HJC
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 5583] Don't write out an unchanged file if all the checksums matched

2009-11-13 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=5583


henrik-rs...@prak.org changed:

   What|Removed |Added

 CC||henrik-rs...@prak.org




--- Comment #4 from henrik-rs...@prak.org  2009-11-13 04:49 CST ---
Here's my "me too" comment on the issue (feel free to move it to a separate bug
depending on this one):

I have stumbled upon the same issue in connection with rsnapshot and rsync with
the "--detect-renamed" patch. 

Basically rsnapshot works like this: On the first run creates a full copy of a
directory tree /src to /dst/0. Then the next time it rotates /dst/(x) to
/dst/(x+1) and creates a copy with just hard links from /dst/1 to /dst/0 and
then calls rsync to transfer the changes between /src and /dst/0, effectively
creating a differential backup at the granularity of files.

I applied the detect-renamed patch to avoid multiple copies of big files when
they are moved around in the directory tree. 

The patch works in so far as it finds the correct base files in /dst.
Then it uses the delta algorithm to make sure that no coincidental match of
filename,size and mtime results in a false positive.

Unfortunately usage of the delta algorithm creates a new copy of the file at
/dst even if the content is the same as the base file (instead of using a
hardlink to the base file).


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


DO NOT REPLY [Bug 6881] --bwlimit option uses KiB/s, but is documented as (what amounts to) kB/s

2009-11-13 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=6881


way...@samba.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED




--- Comment #2 from way...@samba.org  2009-11-13 02:10 CST ---
I've both improved the docs and improved the option to be able to accept the
same unit suffixes that are accepted by --max-size and --min-size.  This makes
it clearer what --bwlimit=1000 is doing, and allows someone to specify
--bwlimit=1000kb (aka --bwlimit=1mb) for a slightly lower transfer limit than
--bwlimit=1MiB.


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Daemon tests broken when running as root

2009-11-13 Thread Wayne Davison
On Thu, Nov 12, 2009 at 9:11 PM, Matt McCutchen wrote:

> I'm guessing the "uid = 0" and "gid = 0" in the test daemon
> configuration were disabled so that the daemon tests could run for
> unprivileged users.
>

Right.  I've checked in a fix that ensures that uid & gid are specified when
the test is run as root.

..wayne..
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html