from:"Chris Shoemaker"

Re: Using a CD to make initial copy

2008-12-07 Thread Chris Shoemaker

On Sun, Dec 07, 2008 at 06:58:57PM -0600, Jay Strauss wrote:
> Hi,
> 
> I have 20Gb of data I need to back up.  It takes too copy it across
> the internet.  Is there a way I can copy it locally to some removable
> media like a couple of DVDs, then bring them to the target machine,
> copy onto the target, then run rsync to grab any updates?
> 
> What if the target doesn't have the same users as the source.
> 
> Is there a suggested recipe for this?

See "Batch Mode" in the man page.

-chris
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Batch mode scenario ("use case")

2007-09-10 Thread Chris Shoemaker

On Sun, Sep 09, 2007 at 11:17:05PM +, Suresh Govindachar wrote:
> 
>   Responding to the question of how to use the batch file 
>   /e/cmds/foo created via the command:
> 
>   > rsync -a --only-write-batch=/e/cmds/foo /c/home/wer/work/ /e/gold
> 
>   to selectively restore a subdirectory of work/ such as 
> 
>   some/path/projects/c_a 
> 
>   into a new location such as  
> 
>  /f/new_home/wer/work
> 
>   which has a copy of some/path/projects/c_a gotten from /e/gold but
>   is otherwise empty, Matt wrote:
> 
>   > Rsync currently doesn't provide a good way to do this.  It would
>   > be great if sender filters could be used to control which
>   > file-list entries from the batch file are processed.  For now, a
>   > hackish way to quickly recover a subdirectory is to --read-batch
>   > to a destination that has been set up specially so that the user
>   > cannot write to anything in the destination except the
>   > subdirectory to be recovered.
> 
>   Is there a specification for the format of the batch file?  

The batch file format is simply a dump of rsync's network protocol.
That way, replaying the batch file is quite similar to performing the
original sync, except for the bytes come from the file instead of from
the socket.

At least, that's the way it worked last time I looked, which was
several years ago.

-chris

>   Might
>   it be possible to write a script to read /e/cmds/foo and create a
>   new batch file that would correspond to having been created while
>   rsync'ing --only-write-batch of work//c_a with gold//c_a?
> 
>   Thanks,
> 
>   --Suresh
>   
> -- 
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync replacement

2007-07-13 Thread Chris Shoemaker

On Fri, Jul 13, 2007 at 03:21:43PM -0400, Mike Jackson wrote:
> Looking for more efficient replication or synchronization solution  
> than rsync, take a look at syncdat by www.dataexpedition.com

Hi Mike, 

   You seem to have a misunderstanding about what qualifies as
on-topic for the rsync mailing list.  (Hint: It's not for advertising
your product.)

   Let me attempt to help out by bringing this thread on topic:

Hello List,

   Does anyone have any experience with 'syncdat' from Data
Expedition?  How does it compare to rsync?  Are there any ways that rsync
could be improved to be a better replacement for syncdat?  Thanks!

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: WARNING: failed verification -- update discarded (will try again).

2005-12-07 Thread Chris Shoemaker

On Wed, Dec 07, 2005 at 11:38:25AM -0700, Joe Peterson wrote:
> WARNING: jukebox/Frank_Sinatra/The_Main_Event/04-Let_Me_Try_Again.flac
> failed verification -- update discarded (will try again).
> 
> What does the "WARNING" imply?  What could have gone wrong?  I cannot
> reproduce it.  When I did another sync immediately after, only the
> directory was in the file list, and then I checked the file in question,
> and it matched the source.

If I saw that, the first thing I'd do is fsck.  If fsck found/fixed
errors, I wouldn't give it another thought.  If not, I'd think harder
while trying to remember where I left my gamma-ray shield.  :)

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: "intelligent" rsync scripts?

2005-11-10 Thread Chris Shoemaker

On Thu, Nov 10, 2005 at 11:30:49AM -0800, Wayne Davison wrote:
> On Thu, Nov 10, 2005 at 10:32:50AM -0500, Chris Shoemaker wrote:
> > If the original file changes, then so will the hard link.
> 
> No -- an rsync update creates a temporary file, and that file gets
> renamed into place, breaking any hard-link that this new code creates.
> (The only exception to this update method is when --inplace is enabled,
> and I made this conflict with the new --detect-rename option.)

Ah, ok.  Perfect.

> > If --detect-renames hardlinks the deleted files it doesn't matter that
> > the orginals are deleted before transfer; hard drive space is not
> > reduced.
> 
> True, but only for correlated files.  Any extraneous files will still be
> deleted before the transfer.  This shouldn't be any worse in disk-space
> use than the alternative of not allowing a delete-before pass.

I see now.  That's a good reason to allow --delete-before.

> > Oh, because the match-search for non-missing files is not delayed in
> > the --delete-during scan, right?
> 
> Exactly.  My text might not have made it clear that this non-missing
> handling will always work for files in a single directory (such as
> log-dir rotations) as well as files found in the already-scanned dirs
> prior to getting to the current directory.

That's what I understood, and it's the only thing thing that doesn't
feel quite right.  Right now, I see the rename-with-replace detection
as gravy, since I only envisioned straight rename-with-no-replace at
first.  But, users are greedy, and they may come to depend on
rename-with-replace.  It's a little hard to explain that it will only
work if the new filepath is lexicographically earlier than the
old filepath.  Also, see below.

> > So, will this be in 2.6.7?
> 
> It has a good chance of making it.  It still needs some cleanup and
> testing, some of which I've just done:  e.g. I've added support for
> checksum matching (instead of mod-time matching) when --checksum is
> enabled.  The aforementioned patch has been updated with the latest
> changes.

--checksum support is good, but I think it raises the user's
expectations even higher about how rsync will perform.  It's even
*harder* to explain that even with the (more expensive) file
checksums, rsync won't use the correct basis for the
rename-with-replace case 50% of the time, on average.

Is it possible to delay the basis search even for non-missing files if
they're not exact matches?

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: "intelligent" rsync scripts?

2005-11-10 Thread Chris Shoemaker

On Wed, Nov 09, 2005 at 11:52:40PM -0800, Wayne Davison wrote:
> > Are you saying only unchanged files are available as alternate basis
> > files?  If we can, I think it's worth avoiding this restriction.
> 
> If we were to use the files directly, then it would be complicated to
> try to order the updates to avoid changing a file before another file
> could use it as a basis file.  However, I've come up with an algorithm
> I like better that avoids this restriction completely:
> 
> Rsync already supports the idea of a "partial dir" that can be scanned
> for partially-transferred files and delayed updates.  I'm thinking that
> hard-linking files into this directory makes this new feature much
> easier and more memory efficient (the dir is named ".~tmp~" by default,
> relative to the containing directory of the to-be-updated files).

Hmm. I see the complexity of using a potentially changing file as an
alternate basis, but I don't see how hardlinking makes this simpler.
If the original file changes, then so will the hard link.  What am I
missing?

> I also thought through where I'd like the rename scan to go.  I finally
> decided that I liked the idea of piggy-backing the scan on the existing
> delete-before or delete-during scans that already occur, since this
> makes the logic much simpler (the code already exists to handle all the
> proper include/exclude logic, including local .cvsignore/.rsync-filter
> files) and it should also make the scan quick because it will take
> advantage of disk I/O that is either already occurring, or is at least
> in close proximity to identical stat() calls that the generator's update
> code is going to make.  (If either --delete-after was selected or no
> deletions are occurring, rsync does the rename scan during the transfer
> using a non-deleting version of the delete-during code).  The only
> potential problem with this scan position is that the receiving side may
> not have fully finished its scan when we encounter a missing file that
> doesn't have a size+mtime match yet, so I allow missing files to be
> delayed until the receiving-side scan is complete (at which point we
> check to see if a match has shown up yet or not).

Reusing the delete-scanning sounds good, but I don't think you have to
use both the --delete-before scan and the --delete-during scan.  I
think the don't-really-delete mode for delete-during is sufficient.  I
really think --detect-renames is incompatible with --delete-before,
even though you can make it look like they're not.  The problem is
that I think one main use of --delete-before is to avoid running out
of hard drive space.  If --detect-renames hardlinks the deleted files
it doesn't matter that the orginals are deleted before transfer; hard
drive space is not reduced.  Thus, I think you can avoid the
--delete-before scan.

> My code also attempts to match up files even when they're not missing.

Nice!  Very handy!

> This works to the fullest extent when a delete-before scan is in effect,

Oh, because the match-search for non-missing files is not delayed in
the --delete-during scan, right?  Even so, that gives (part of) a
significant benefit that I didn't expect, so it's a good thing.  I'll
have to think more about the full rename-and-replace problem across
multiple directories.

> but it still handles the case of the rotating log files quite nicely
> (associating all the moved files together as you would expect).
> 
> A patch for the CVS version is here:
> 
> http://opencoder.net/detect-renames.diff

Not a big diff considering the impact on functionality.  You make it
look easy! I like it. :)

> The code is still a little ugly, but it does appear to work well in my
> limited testing.  If I like the idea, I'll look into how to share the
> code for the delete scan in a way that is not as ugly as the current
> logic.
> 
> > $ cp foo foo.orig; edit foo
> > 
> > Not using the old foo as the basis for foo.orig just because foo
> > changed really hurts.
> 
> If the user uses "cp -p foo foo.orig" we will find it.  The patch could
> be extended to switch from size+mtime to use size+checksum, but I
> haven't done that yet (and checksumming is so slow that most folks tend
> to avoid it).

At least it catches move-and-replace.  That's a real bonus.  So, will
this be in 2.6.7?

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: "intelligent" rsync scripts?

2005-11-07 Thread Chris Shoemaker

On Mon, Nov 07, 2005 at 02:37:48PM -0800, Wayne Davison wrote:
> On Mon, Nov 07, 2005 at 05:03:30PM -0500, Chris Shoemaker wrote:
> > Yeah, I think I'm saying just treat (1) and (2) the same way.  OTOH,
> > if the behavior is optional and documented, I could definitely see
> > treating (1) as an exact match.
> 
> Yes, perhaps it would be better to let the user decide how strict to be.
> 
> > But you can't do the lookups until you've received the entire
> > file-list, right?
> 
> We can do the hashing of what files are present on the receiving side.
> The purpose is to create a database of files that will be used later
> when the generator is trying to find a match for a file that is missing
> (which we will discover later during the normal generator pass).
> 
> > You mean [the dir] gets removed when it's received?  Why even add it then?
> 
> Because we're creating a list of extra directories that aren't on the
> sending side and we're scanning the local directory as soon as we see
> its name in the received file list, which will cause us to hash names
> that may later turn out to be in the list that the sender sends to us.

Ok, so the purpose of the directory list is to make sure all the local
directories are scanned for potential basis files, even directores not
mentioned in the transmited file-list, right?  I didn't realize that
would require a table and delaying the scan of unknown directories
until *after* the file-list scan was done.  I assumed *all* the local
files (even those in unknown directories) could be hashed on the first
pass through the file-list.

> 
> > # of insertions = # of receiver files not in transfer
> 
> In my described algorithm it was "# of insertions = all files on the
> receiving side" because we don't know what will be in a particular
> directory until after the sender recurses clear down to the bottom of
> all child directories and comes back up and sends the last filename at
> that directory's level.  If we change the sender to send all the files
> (including all directory names) at a single level before going down into
> a subdir, we could code up the local scan to occur at the point where
> either the level changes or the dir changes at the current level.  Such
> a change would not be compatible with older rsync receivers, though (due
> to how the current receiver expects to be able to mark nested files in
> its received file-list).
> 
> Your comment does remind me that we don't want to pick an alternate
> basis file that is currently in the transfer since that file may
> possible be updated (which can cause problems if it happens at the wrong
> time).  

Are you saying only unchanged files are available as alternate basis
files?  If we can, I think it's worth avoiding this restriction.  I
imagine a case inspired by logrotate(8):

FILE  ---> renamed to ---> FILE
log (a new file)
log log.1
log.1   log.2
log.2   log.3
log.3   log.4

where log.4 appears to be a missing file but is really just a renamed
log.3.  And, log.3, log.2 and log.1, will probably be retransmitted in
full (there's a problem for another day, but this is why I was
thinking of a hashtable of all files --checksums).  But the point here
is that it'd be nice to be able to use (the old) log.3 as the basis
for log.4, even while updating to the new log.3.

In general, I think that when a file is renamed, it's *very often*
precisely because the original is changing.  I.e. It's a backup.

$ cp foo foo.orig; edit foo

Not using the old foo as the basis for foo.orig just because foo
changed really hurts.  This is worth getting right.

-chris

> Thus, there would need to be a lot of hash-table deletions going
> on in my imagined algorithm in the file-name hash as well as the
> dir-name hash.
> 
> ..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: "intelligent" rsync scripts?

2005-11-07 Thread Chris Shoemaker

On Mon, Nov 07, 2005 at 12:01:35PM -0800, Wayne Davison wrote:
> On Wed, Oct 26, 2005 at 02:04:34PM -0400, Chris Shoemaker wrote:
> > That option should imply at least, --checksum and --delete-after if
> > --delete at all.
> 
> I don't think it needs --checksum because rsync can simply use a
> non-exact match as the basis file for the transfer.

Hmm... I think you're right.  I need to remember that it's not
necessary to _always_ avoid the use of an incorrect basis.  It's just
more efficient to make it unlikely.

> 
> > For each file on the sender which is *missing* from the receiver, it
> > needs to search the checksums of all of receiver's existing files for
> > a checksum match.
> 
> I'd make it: (1) lookup a file-size + mod-time + file-name match;
> if found, copy that file locally and consider the update done. 

I don't know about "consider the update done".  This is less strict
than current behavior, since the paths *are* different.  Someone may
depend on that behavior (unlikely, I know).  I think it's a little
safer to still checksum the moved file.  IOW, treat it the same way
you would a "--fuzzy" match.  (assuming I understand that feature.)

> (2)
> lookup a file-size + mod-time match OR just a file-name match, and use
> that file as a basis file in the transfer, which can greatly speed it up
> the transfer if the file is largely the same as the new file.

Yeah, I think I'm saying just treat (1) and (2) the same way.  OTOH,
if the behavior is optional and documented, I could definitely see
treating (1) as an exact match.  I guess I was thinking that treating
(1) as a fuzzy match would be required if the rename detection was
default behavior.  (And, depending on the cost, I wouldn't necessarily
mind it being the default.)

> 
> The way I see this being implemented is to add a hash-table algorithm to
> the code so that rsync can hash several things as the names arrive
> during the opening file-list reception stage:  the receiving side would
> take every arriving directory name (starting with the dest dir) and
> lookup the names in the local version of that dir, creating a hash table
> based on file-size + mod-time, a hash table based on file-name (for
> regular files), and a hash table based on any directory names it finds

Meaning just the receiver directories NOT in the arriving list, or
*every* receiver directory?

> (this attempts to do the receiving side scanning incrementally as the
> names arrive instead of during a separate pass after the file-list is
> finished).  

But you can't do the lookups until you've received the entire
file-list, right?  Otherwise you may not have yet seen the "originals"
of the moved files.

> As each directory gets scanned, that name gets removed from
> the directory-name hash.  

You mean it gets removed when it's received?  Why even add it then?
I'm probably missing something here.

> At the end of the file-list reception, any
> remaining directory names in the dir-hash table also get scanned
> (recursively).  This would give us the needed info in the generator to
> allow it to lookup missing files to check for exact or close matches.

Yes.

> 
> One vital decision is picking a good hash-table algorithm that allows
> the table to grow larger efficiently (since we don't know how many files
> we need to hash before-hand).  I'm thinking that trying the libiberty
> hashtab.c version might be a good starting point.  Suggestions?  Perhaps
> a better idea than a general-purpose hash-table algorithm might be to
> just collect all the data in an array (expanding the array as needed)
> and then sort it when we're all done.  This would use a binary-search
> algorithm to find a match.  The reason this might be better is that it
> is likely that the number of missing files will not be a huge percentage
> of the transfer, so making the creation of the "hash table" efficient
> might be more important than making the lookup of missing files
> maximally efficient.

# of insertions = # of receiver files not in transfer
# of lookups = # of sender files missing from receiver

I can't think of a reason why either term would dominate.  But,
pipeline concerns may make it better to push the cost into the later
operation, i.e. lookup.  That would suggest using an array for
constant-cost insertion.  

> 
> Have you done any work on this, Chris?  If not, I'm thinking of looking
> into this soon.

Nothing more than thinking.  It's been #3 on my list since the
original post, but #1 and #2 aren't wrapping up quickly.  I was hoping
you'd like the idea enough to beat me to it.  :)

-chris

> 
> ..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: "intelligent" rsync scripts?

2005-10-26 Thread Chris Shoemaker

On Wed, Oct 26, 2005 at 09:07:51PM +0200, Eberhard Moenkeberg wrote:
> Hi,
> 
> On Wed, 26 Oct 2005, Chris Shoemaker wrote:
> >On Wed, Oct 26, 2005 at 08:12:30PM +0200, Eberhard Moenkeberg wrote:
> 
> >>The first pass of "rename-without-modification" could even be much easier:
> >>size and timestamp should match.
> >
> >Eeek.  That's a bit too risky for my tastes.
> >I'd be comfortable with "Size && timestamp && (checksum || filename)"
> >but not just "Size && timestamp".
> 
> Surely. But this first pass would reduce the necessity for the 
> checksumming pass a lot.
> And the checksumming pass can stop on the first mismatch.

Maybe.  If sizes matched but timestamps didn't, I think I'd still
checksum and skip if matched.  OTOH, if sizes don't match, you /could/
skip the checksum.  It's a little more complicated since your receiver
checksums are then indexed differently than the sender checksums, but
it /would/ be cheaper.

-chris

> 
> Cheers -e
> -- 
> Eberhard Moenkeberg ([EMAIL PROTECTED], [EMAIL PROTECTED])
> -- 
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: "intelligent" rsync scripts?

2005-10-26 Thread Chris Shoemaker

On Wed, Oct 26, 2005 at 08:12:30PM +0200, Eberhard Moenkeberg wrote:
> The first pass of "rename-without-modification" could even be much easier:
> size and timestamp should match.

Eeek.  That's a bit too risky for my tastes.  
I'd be comfortable with "Size && timestamp && (checksum || filename)"  
but not just "Size && timestamp".

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: "intelligent" rsync scripts?

2005-10-26 Thread Chris Shoemaker

On Wed, Oct 26, 2005 at 03:02:51PM +0200, Tomasz Chmielewski wrote:
> I use rsync for backing up user data, profiles, important network shares 
> etc. (from several locations over WAN).
> 
> Overall it works flawlessly, as it transfers only changes, but sometimes 
> there are some serious hiccups.
> 
> Suppose this scenario, suppose it's 1 GB of files:
> 
> user shares:
> 
> /home/joe/data/file1
>   /file2
>   /...
>   /file1000
> 
> Now the user _moves_ that data to some other folder:
> 
> /home/joe/WAN_goes_crazy/file1
>   /file2
>   /...
>   /file1000
> 
> ...and we start a backup process.
> 
> rsync will first transfer data from "/home/joe/WAN_goes_crazy/file...", 
> and then deletes "/home/joe/data/data...".
> 
> Basically, this is how rsync works, but in the end, we transfer 1 GB of 
> files over WAN that we already have locally - the only thing that 
> changed was the folder where that data is.
> 
> Is there some workaround for this (some intelligent script etc.)?

ISTM it would be quite useful to make rsync "rename-aware".  Caveat: I
haven't hacked on rsync for quite a while, so my understand may be
wrong or outdated.  But, I think this could be implemented thusly:

You'd want to make this optional, say --detect-renames, because it
does incur an extra processing cost.  That option should imply at
least, --checksum and --delete-after if --delete at all.  Then you
just need the generator to be slightly more clever.  For each file on
the sender which is *missing* from the receiver, it needs to search
the checksums of all of receiver's existing files for a checksum
match.  If it finds a match, it can simply use that matched file and
either copy or move it to the new filename.  Then that file just gets
skipped.

I don't think this would require any changes to sender, receiver or
protocol.  What I described would only handle
rename-without-modification, but it's cost is not very high.  I think
it's O(N*M), N=# of files on sender that are missing on receiver, M=#
of files on sender.  That's the cost over and above whatever
--checksum costs.  

I don't see how rename-with-modification could be handled efficiently,
though.  Better not to go there.

If nobody says I'm way off base here, I might be inspired to try to
implement this.  Unless someone else has the time and inclination...

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

broken link on rsync website

2005-10-26 Thread Chris Shoemaker

Wayne,

The link to the the 2.6.5 Release NEWS is broken.

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync failed: Too many links

2005-10-01 Thread Chris Shoemaker

On Sat, Oct 01, 2005 at 06:38:44AM -0400, Kent Miller wrote:
> Dear Sir or Madam,
>   Has anyone seen a error message like the following?
> 
> rsync: recv_generator: mkdir "/home/kmiller/briefcase/1205275" failed: Too
> many links (31)
> rsync: stat "/home/kmiller/briefcase/1205275" failed: No such file or
> directory (2)
> 
>   As far as I can tell, I am not using any symlinks or hardlinks.  
> Please find below a reasonably complete bug report.  Please let me know
> what I should do.

On more thing:  to exonerate rsync, try:

$ mkdir ~/briefcase/1205275 

on host B.

I suspect that you'll see the same failure.

-chris

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [Bug 3099] Please parallelize filesystem scan

2005-09-16 Thread Chris Shoemaker

On Thu, Sep 15, 2005 at 09:32:44PM -0400, Chris Shoemaker wrote:
> On Thu, Sep 15, 2005 at 04:23:24PM -0700, [EMAIL PROTECTED] wrote:
> > https://bugzilla.samba.org/show_bug.cgi?id=3099
> > 
> > 
> > 
> > 
> > 
> > --- Additional Comments From [EMAIL PROTECTED]  2005-09-15 16:23 ---
> > Created an attachment (id=1448)
> >  --> (https://bugzilla.samba.org/attachment.cgi?id=1448&action=view)
> > One possible way to reorder the checksum computation.
> > 
> > > how could it possibly require a change to the rsync protocol for the
> > > second host in the sequence to pre-scan its filesystem, so that that
> > > data is available when needed?
> > 
> > The only way to know what to scan is to look at the file list from the 
> > sender
> > (since the receiver usually doesn't know anything other than the destination
> > directory, and options such as -R, --exclude, and --files-from can radically
> > limit what files need to be scanned).
> > 
> > I suppose it would be possible for the receiver to compute the full-file
> > checksums as the file list is arriving from the sender (yes, the sender 
> > sends
> > the list incrementally as it is created), but the code currently doesn't 
> > know
> > if the destination spec is a file or a directory until after it receives the
> > file list, so the code would need to be made to attempt a chdir to the
> > destination arg and to skip the pre-caching if that doesn't work.
> > 
> > One bad thing about this solution is that we really should be making the
> > sending side not pre-compute the checksums before the start of the transfer
> > phase (to be like the generator, which computes the checksums while looking 
> > for
> > files to transfer). Computing them during the transfer makes it more likley
> > that the file's data in the disk cache will be able to be re-used when a 
> > file
> > needs to be updated. Thus, changing the receiving side to pre-compute the
> > checksums before starting the transfer seems to be going in the wrong 
> > direction
> > (though it might speed up a large transfer where few files were different, 
> > it
> > might also slow down a large transfer where many files were changed).
> 
> IMHO, in general, optimizing for the "few-changes" (small delta) case
> is the right thing to do.  Rsync's utility diminishes anyway as delta
> increases, so there's no reason not to make efficiency increase with
> increasing delta.

err... I meant: make efficiency increase as delta *decreases*.
i.e. optimize for small-changes case.

> 
> -chris
> 
> > 
> > The attached patch implements a simple pre-scan that works with basic 
> > options.
> > It could be improved to handle things like --compare-dest better, but I 
> > think
> > it basically works.  If you'd care to run some speed tests, maybe you could
> > persuade me that this kluge would be worth looking at further (I'm not
> > considering it at the moment).
> > 
> > -- 
> > Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
> > --- You are receiving this mail because: ---
> > You are the QA contact for the bug, or are watching the QA contact.
> > -- 
> > To unsubscribe or change options: 
> > https://lists.samba.org/mailman/listinfo/rsync
> > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
> -- 
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [Bug 3099] Please parallelize filesystem scan

2005-09-15 Thread Chris Shoemaker

On Thu, Sep 15, 2005 at 04:23:24PM -0700, [EMAIL PROTECTED] wrote:
> https://bugzilla.samba.org/show_bug.cgi?id=3099
> 
> 
> 
> 
> 
> --- Additional Comments From [EMAIL PROTECTED]  2005-09-15 16:23 ---
> Created an attachment (id=1448)
>  --> (https://bugzilla.samba.org/attachment.cgi?id=1448&action=view)
> One possible way to reorder the checksum computation.
> 
> > how could it possibly require a change to the rsync protocol for the
> > second host in the sequence to pre-scan its filesystem, so that that
> > data is available when needed?
> 
> The only way to know what to scan is to look at the file list from the sender
> (since the receiver usually doesn't know anything other than the destination
> directory, and options such as -R, --exclude, and --files-from can radically
> limit what files need to be scanned).
> 
> I suppose it would be possible for the receiver to compute the full-file
> checksums as the file list is arriving from the sender (yes, the sender sends
> the list incrementally as it is created), but the code currently doesn't know
> if the destination spec is a file or a directory until after it receives the
> file list, so the code would need to be made to attempt a chdir to the
> destination arg and to skip the pre-caching if that doesn't work.
> 
> One bad thing about this solution is that we really should be making the
> sending side not pre-compute the checksums before the start of the transfer
> phase (to be like the generator, which computes the checksums while looking 
> for
> files to transfer). Computing them during the transfer makes it more likley
> that the file's data in the disk cache will be able to be re-used when a file
> needs to be updated. Thus, changing the receiving side to pre-compute the
> checksums before starting the transfer seems to be going in the wrong 
> direction
> (though it might speed up a large transfer where few files were different, it
> might also slow down a large transfer where many files were changed).

IMHO, in general, optimizing for the "few-changes" (small delta) case
is the right thing to do.  Rsync's utility diminishes anyway as delta
increases, so there's no reason not to make efficiency increase with
increasing delta.

-chris

> 
> The attached patch implements a simple pre-scan that works with basic options.
> It could be improved to handle things like --compare-dest better, but I think
> it basically works.  If you'd care to run some speed tests, maybe you could
> persuade me that this kluge would be worth looking at further (I'm not
> considering it at the moment).
> 
> -- 
> Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
> --- You are receiving this mail because: ---
> You are the QA contact for the bug, or are watching the QA contact.
> -- 
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: --backup leaves window where file doesn't exist.

2005-09-15 Thread Chris Shoemaker

On Thu, Sep 15, 2005 at 08:45:00AM -0400, Dave Mielke wrote:
> When using --backup, the sequence (as monitored by strace) is:
> 
>rename("/path/to/", "/")
>rename("/path/to/..xx", "/path/to/")
> 
> This, of course, leaves a momentary window wherein  can't be found. 
> Might 
> it be possible to replace the first rename() with link() instead? This, of
> course, could only be done when the backup directory is on the same volume.

Good catch.  But what about when backup-dir is not on same filesystem
as original?  link will give EXDEV, I think.

-chris

> 
> -- 
> Dave Mielke   | 2213 Fox Crescent | I believe that the Bible is the
> Phone: 1-613-726-0014 | Ottawa, Ontario   | Word of God. Please contact me
> EMail: [EMAIL PROTECTED] | Canada  K2A 1H7   | if you're concerned about Hell.
> http://FamilyRadio.com/   | http://Mielke.cc/bible/
> -- 
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Open Database RSYNC

2005-09-08 Thread Chris Shoemaker

On Thu, Sep 08, 2005 at 02:20:37PM -0400, Poe, David wrote:
> We have nearly 200 GB of data in a production Oracle database broken up
> into about 100 files of 2 GB.  The database incurrs a 5% change per week
> in the form of new data, no modification nor deletions.  I need to copy
> this data from one mount point to another then bring up the new database
> on the new mount point in place of the original.  The high availability
> and production nature of this system means that my maintenance windows
> are few and far between.  To minimize my use of the maint window, I
> would like to pre-copy as much data as I can with RSYNC before the

For the first copy, it's probably more efficient to just use 'cp', but
rsync would work, too.


> window, then do a final sync of the data during the window with the
> database down.  My question is, will RSYNC be a good option given that
> the Oracle database is up and running for the initial sync?  

"good" compared to what?  to copying the database files while the
database is down?  That depends on how much the files changed between
the time you copy them and the time you rsync them.  Best-case is far
better than cp.  Worst-case is not much worse than cp.

-chris


> I'm hoping
> that RSYNC will be an option for us, as other mirroring strategies I
> have seen rely upon same size disk devices/partitions, which we do not
> want because the goal is to put up a single device larger area to keep
> this data.  Please ignore the gaping holes for ideas such as just having
> Oracle use another mount point in addition to the one it is already
> using or online VG expansions.
>  
> David Poe

> -- 
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync transmits unchanged data

2005-07-27 Thread Chris Shoemaker

On Wed, Jul 27, 2005 at 04:26:39PM +0200, Martin Kammerlander wrote:
> Zitat von Chris Shoemaker <[EMAIL PROTECTED]>:
> 
> > On Wed, Jul 27, 2005 at 10:36:05AM +0200, Martin Kammerlander wrote:
> > > hi all!
> > >
> > > I tried to synchronise a directory with the following command:
> > >
> > > rsync -avz --exclude "db/" /source/ /destination/folder/ --delete
> > >
> > > The source has 3 subfolders one of them is not necessary for to 
> > > synchronize
> > so i
> > > exluded this folder.
> > > everything works fine and fast :) excludes were made...and all seems ok
> > > The data on the source folder changes every day once.
> > >
> > > But there is still one "problem": i tried to execute this command several
> > times.
> > > Sometimes rsync gives me the following output:
> > >
> > >
> > > building file list ... done
> > >
> > > sent 2801 bytes  received 20 bytes  5642.00 bytes/sec
> > > total size is 28151695  speedup is 9979.33
> > >
> > > So everything is up to date no changes were made...this is how it should
> > be!
> > >
> > >
> > > But mostly rsync copies always the same files of a certain folder again to
> > the
> > > destination folder...although REALLY no changes were made to the files!?
> > >
> > > rsync copies not all files of the this certain folder but only a few of
> > them...
> > >
> > > Sombody knows what's going wrong here...or maybe what is more likely: what
> > I'm
> > > doing wrong on my command?
> > >
> > > thanks for any suggestion
> >
> > Why don't you include one or two more '-v' and then mail us the output
> > for when rsync transfers file even though you think it shouldn't.
> >
> > -chris
> >
> 
> The mail becomes too long when I post the hole output...
> 
> The command was:
> rsync -avz --exclude "db/" /source/ /destination/folder/ --delete
> 
> I suppose I found out now what the problem is:
> 
> the folder "/source/" is a nfs mounted partition over the network...this seems
> to cause trouble. when I'm copying (with rsync) from my local hard drive the
> same files as from the nfs mounted one then everything is working fine!!

I don't know why it would matter, unless 1) the files really are
changing or 2) the nfs server is inconsistent about reporting
modification times.

> 
> Is there another possibility to use rsync without mounting the nfs partition?
> How can I synchronize files from nfs??

If the nfs server runs sshd, you could always rsync from the remote
machine over ssh.  (Hint: use a single colon.)  Or, if you admin the
nfs server you could run an rsync daemon on the server.  (double colon)

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync transmits unchanged data

2005-07-27 Thread Chris Shoemaker

On Wed, Jul 27, 2005 at 10:36:05AM +0200, Martin Kammerlander wrote:
> hi all!
> 
> I tried to synchronise a directory with the following command:
> 
> rsync -avz --exclude "db/" /source/ /destination/folder/ --delete
> 
> The source has 3 subfolders one of them is not necessary for to synchronize 
> so i
> exluded this folder.
> everything works fine and fast :) excludes were made...and all seems ok
> The data on the source folder changes every day once.
> 
> But there is still one "problem": i tried to execute this command several 
> times.
> Sometimes rsync gives me the following output:
> 
> 
> building file list ... done
> 
> sent 2801 bytes  received 20 bytes  5642.00 bytes/sec
> total size is 28151695  speedup is 9979.33
> 
> So everything is up to date no changes were made...this is how it should be!
> 
> 
> But mostly rsync copies always the same files of a certain folder again to the
> destination folder...although REALLY no changes were made to the files!?
> 
> rsync copies not all files of the this certain folder but only a few of 
> them...
> 
> Sombody knows what's going wrong here...or maybe what is more likely: what I'm
> doing wrong on my command?
> 
> thanks for any suggestion

Why don't you include one or two more '-v' and then mail us the output
for when rsync transfers file even though you think it shouldn't.

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [patch] paranoid checksum checking

2005-07-26 Thread Chris Shoemaker

On Tue, Jul 26, 2005 at 06:19:08PM +0100, Nick Burrett wrote:
> The attached patch provides an additional check for the checksumming 
> mode to ensure that a file that is actually written out to disk can be 
> read back and has the same MD4 sum as the file on at the originating 
> location.

I'm not so sure there's a strong correlation between the group of
people who want the normal '--checksum' behavior and the people who
want forced read-back verification.  I can think of many cases where
the user would want each without the other.

I propose that this behavior shouldn't be grouped with "--checksum"
Instead, perhaps make it a '--verify' or '--re-verify' option.  Also,
this patch needs a corresponding change to rsync.yo.

-chris

> 
> Regards,
> 
> 
> Nick.

> *** rsync-2.6.6pre1/receiver.c2005-04-14 02:42:13.0 +0100
> --- rsync-new/receiver.c  2005-07-26 18:06:56.0 +0100
> *** extern int module_id;
> *** 46,51 
> --- 46,52 
>   extern int ignore_errors;
>   extern int orig_umask;
>   extern int keep_partial;
> + extern int always_checksum;
>   extern int checksum_seed;
>   extern int inplace;
>   extern int delay_updates;
> *** int recv_files(int f_in, struct file_lis
> *** 649,654 
> --- 650,669 
>   exit_cleanup(RERR_FILEIO);
>   }
>   
> + /* Check that the file written to local disk has the same
> +checksum as the file in the originating location.  This
> +is a further paranoia check, just to make sure that
> +we really have successfully transferred the file.  */
> + if (recv_ok && ! am_server && always_checksum) {
> + char csum[MD4_SUM_LENGTH + 1];
> + file_checksum (fnametmp, csum, file->length);
> + if (memcmp(csum, file->u.sum, MD4_SUM_LENGTH) != 0) 
> {
> + rprintf (FERROR, "%s checksum does not 
> match remote checksum\n",
> +  full_fname (fnametmp));
> + recv_ok = 0;
> + }
> + }
> + 
>   if ((recv_ok && (!delay_updates || !partialptr)) || inplace) {
>   finish_transfer(fname, fnametmp, file, recv_ok, 1);
>   if (partialptr != fname && fnamecmp == partialptr) {

> -- 
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Thanks for an excellent rsync!

2005-04-11 Thread Chris Shoemaker


I had occasion this morning to apply rsync to another task.  As usual,
the docs were informative, the functionality I needed was easily
supported, and rsync worked like a charm.

Of course, all this is the result of a lot of hard work by the rsync
developers, especially Wayne.  I decided it would be good to publicly
acknowledge this, lest we all take it for granted.

To summarize, from rsync we get:

  - an indispensable tool

  - thorough documentation

  - extremely fast responses to bug reports

  - a rich feature set

  - continual improvement with new features and performance enhancements

  - a commitment to backward-compatibility

  - a commitment to security

  - excellent tech support, even for "dumb" questions

  - a professional release cycle
  
  - maintenance of auxiliary patches
  
  - and much more,  (feel free to mention what I've left off)


And, unbelievably, most of us get all this FOR FREE!  For these
reasons and more, rsync remains outstanding in the wide field of free
and open source software.  A heartfelt thanks goes to Wayne, Tridge,
and all of the rsync developers.

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Rsync signatures and incremental tape backup

2005-02-24 Thread Chris Shoemaker

On Thu, Feb 24, 2005 at 04:01:52PM -0800, Richard Patterson wrote:
> Hello,
> 
> Andrew Trigdell's original rsync paper (actually
> thesis) discussed the idea of using rsync to make
> incremental tape backups based not on whole files but
> rather parts of files. Sadly, this functionality is
> not actually present in the rsync program. I'd like to
> explore adding this ability.
> 
> There are basically three things that need to be done
> to enable efficient (partial-file) incremental tape
> backups, namely:
> 1) Generation of rsync signatures from existing files.
> 2) Generation of a binary patch from existing files +
> an rsync signature.
> 3) Application of a binary patch to existing files.
> 
> Rsync already has the facility to generate and apply
> binary patches -- namely, batch-mode operation. Thus
> all that remains to be added is read and write support
> for signatures.
> 
> It seems to me the best way is through two new
> options: --read-signature and --write-signature.
> --read-signature would operate similarly to
> --compare-dest. --write-signature would generate
> signature files from the destination.
> 
> Finally, one would need to be able to run rsync
> without an actual destination parameter when used with
> --write-batch. Thus we would have the following
> combinations:
> For a full backup:
> --write-batch --write-signature
> For a differential backup:
> --write-batch --read-signature --write-sigature
> For a leaf incremental backup:
> --write-batch --read-signature
> For a restore, one would specify --read-batch with no
> source.
> 
> I'm willing to implement this, but wanted to get some
> feedback first. Comments?

Perhaps this would also be a useful optimization for server mode.
During a period that the servers repository is not changing it could
run from the signature file, which would effectively be a cache of the
checksums. (as long as client didn't request a different checksum seed)


Also, I think this might not be too hard to implement.  Presumably,
the signature file contains exactly what a "write-batch file" to an
empty destination directory would contain, minus that actual file
blocks.  Is that what you're thinking?

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Query re: rolling checksum algorithm of rsync

2005-02-11 Thread Chris Shoemaker

On Fri, Feb 11, 2005 at 11:08:45AM +, Alun wrote:
> Chris Shoemaker ([EMAIL PROTECTED]) said, in message
> <[EMAIL PROTECTED]>:
> > 
> > > If the log file is e.g. 2Gbytes long and has only had 100Kbytes appended
> > > since the last rsync, then using --whole-file means 2GBytes of network
> > > traffic and 2GBytes of disk I/O at either end. Using the checksum means
> > > 2Gbytes of disk I/O at either end and 100Kbytes of network traffic (plus 
> > > the
> > > checksum data). Neither is ideal.
> > 
> > use logrotate.
> 
> I'm aware of things like logrotate, but if I have to rotate the logs every
> hour on each of my webcache servers so that rsync will perform well, then I
> can't really afford to do it. I'd end up keeping 144 logfiles per day on
> the logging server just to make rsync efficient.

144 per day?  Oh dear!  :) I think your filesystem could handle it.
BTW, using multiple file has the additional benefit of time-stamping
periods of log -- like bookmarks in time.

> 
> Similarly, remote syslog wouldn't tackle it since not all the services for
> which we need to collate logs even use syslog.

Log onto a networked filesystem then?

> 
> At the moment, I have a script which runs every 10 minutes and just copies
> over the tail of the logfile, using the current size on the logging server
> as its start point. This works OK, but it's yet another custom service to
> maintain. 

Doesn't sound too bad, but I can see why you'd want to simplify it.

> 
> We already use rsync widely for other purposes on these servers and a patch
> like I mentioned would allow us to use it for this extra job too. 

Even though it pretends to be, rsync is not really a swiss army knife.
It's fundamentally suited for synchronization of two files.  But your
case only has one potential source of data.  No independent changes
are allowed at the destination so no synchronization is needed.
You're just copying data in one direction.  For that problem, the
simplest solution will never include rsync.

> 
> I know it's forcing rsync to do something that doesn't make sense in the
> general case, but in the specific case of files which are almost always
> appended, it could be a gain. 
> 
> > Probably not.  I suspect even what you describe wouldn't give you what
> > you want.  How would you reliably choose n?  
> 
> For my application, I could use:
> 
> n = max("current size of file on logging server minus 1Mbyte", 0)

IOW, files on logging server aren't changing.  See above.

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Prune deleted files with rsync?

2005-02-10 Thread Chris Shoemaker

On Thu, Feb 10, 2005 at 01:23:05PM -0800, Mark Winslow wrote:
> Hi, is there a way to have rsync prune files that
> exist on the destination path that no longer exist on
> the source path?  

Yes, there is.  Please see the excellent documentation that comes with
rsync.

> 
> Thanks.
> 
You're welcome. :)

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Query re: rolling checksum algorithm of rsync

2005-02-10 Thread Chris Shoemaker

On Thu, Feb 10, 2005 at 11:36:51AM +, Alun wrote:
> 
> I think this is a related question (if not identical) to one I asked some
> time back. If you're synchronising log files, for example, then you may be
> able to guarantee that all changes to the file happen at the end of it.
> Unfortunately, rsync doesn't give you the opportunity to use this extra
> information to save I/O and bandwidth.
> 
> If the log file is e.g. 2Gbytes long and has only had 100Kbytes appended
> since the last rsync, then using --whole-file means 2GBytes of network
> traffic and 2GBytes of disk I/O at either end. Using the checksum means
> 2Gbytes of disk I/O at either end and 100Kbytes of network traffic (plus the
> checksum data). Neither is ideal.

use logrotate.

> 
> I suspect it wouldn't fit inside the rsync protocol, but I'd like to see
> something that says "start working backwards from the end of the file until
> you find n matching blocks, then transfer from that point onwards". It would
> let me get rid of some horrible hacky code here!
> 
> Would it be useful to be able to tell rsync "assume the first n Kbytes of
> the files at either end are identical and not useful for checksum purposes"?

Probably not.  I suspect even what you describe wouldn't give you what
you want.  How would you reliably choose n?  

The fundamental problem here is that you're trying to treat different
parts of a file as if they had different modification dates.  That
won't work.  Break the file into pieces, and rsync will work.
Otherwise, you have your own custom database, so you need your own
synchronization methods.  You can't expect rsync to work (well) in
that case.

-chris  

> 
> Cheers,
> Alun.
> 
> -- 
> Alun Jones   [EMAIL PROTECTED]
> Systems Support, (01970) 62 2494
> Information Services,
> University of Wales, Aberystwyth
> 
> 
> -- 
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync huge tar files

2005-02-08 Thread Chris Shoemaker

On Tue, Feb 08, 2005 at 11:35:51AM -0700, Tim Conway wrote:
> If it is, as you say, uncompressed, rsync will work on it as-is, finding 
> and sending the changes.

That was exactly my first thought, but I think he was meaning to say
that the file's contents were 2GB when uncompressed, not that the file
itself was not compressed.  At least, that's the only way his email
makes sense to me.
-chris

> 
> Tim Conway
> Unix System Administration
> Contractor - IBM Global Services - ODCS
> desk:3039240938
> [EMAIL PROTECTED]
> 
> 
> 
> 
> Harald Dunkel <[EMAIL PROTECTED]> 
> Sent by: [EMAIL PROTECTED]
> 02/04/2005 02:37 AM
> 
> To
> rsync@lists.samba.org
> cc
> 
> Subject
> rsync huge tar files
> 
> 
> 
> 
> 
> 
> Hi folks,
> 
> Are there any tricks known to let rsync operate on huge tar
> files?
> 
> I've got a local tar file (e.g. 2GByte uncompressed) that is
> rebuilt each night (with just some tiny changes, of course),
> and I would like to update the remote copies of this file
> without extracting the tar files into temporary directories.
> 
> Any ideas?
> 
> 
> Regards
> 
> Harri
> -- 
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
> 

> -- 
> To unsubscribe or change options: 
> https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: feedback on rsync-HEAD-20050125-1221GMT

2005-01-31 Thread Chris Shoemaker

On Mon, Jan 31, 2005 at 11:04:32AM -0500, Alberto Accomazzi wrote:
> 
> I agree that exclude/include patters can be tricky, and you have a good 
> point about familiarity versus complexity.  I think what makes them hard 
> to handle is the fact that we are dealing with filename (and directory 
> name) matching and recursion.  So matching only a subset of a file tree, 
> while simple as a concept, is non-trivial once you sit down and realize 
> that you need a well-defined syntax for it.  Can you write a find 
> expression that is simpler or more familiar to the average user than an 
> rsync's include/exclude?

Simpler?  That depends on how complex the rules are.  For the
simplest case, `--exclude '*.bak'` is just as simple (if not simpler)
than `find . -name '*.bak'`.  But for the complex cases, I think
find's syntax is well-designed for simply and powerfully expressing
file-selection.  But, isn't this all the subjective complexity I
wasn't going to argue?

More familar?  I think so.  If the user has never seen any
syntax/tool for specifying file-selection, then they have to learn
_something_.  If they have, then they should be able to use whatever
they know.  Note: I'm not saying that we should force the use of
'find'.  The user knows what tool they want to use -- they just want
to be able to use it.

I just thought of an example.  I have several machines with
FC3 installed.  Today, I screwed one up by installing some buggy
software that ruined my printer setup.  It overwrote some files it
shouldn't have and deleted some files it shouldn't have.  I ended up
fixing it by downloading rpms and force reinstalling them.  But I had
those rpms installed on the other machine, with different config
files.

I could have gone:

rsync -a --files-from=- / remote:/ < rpm --query --list mypkg1 mypkg2 mypkg3

but I know that mypkg1 has config files unique to each machine, that I
don't want to copy.  What are those files?  I have no idea, but rpm
knows.  It would be nice to say:

rsync -a --files-from=- --filter='-q rpm -Vv mypkg1 |grep " c "' / remote:/ < 
rpm --query --list mypkg1 mypkg2 mypkg3

where -q means "exclude the filelist returned by the following command"

I know this could be accomplished in other ways, but I just wanted to
illustrate the point that "external command" doesn't always mean
"find".  User knows best.

> 
> >(The allusion to GNU
> >tar's --exclude option which takes only a filename, not a pattern,
> >isn't really helpful in understanding rsyncs --exclude option.)
> 
> Uh?  Tar does take patters for exclusion, and has its own quirky way of 
> dealing with wildcards, directory matching and filename anchoring:
> http://www.gnu.org/software/tar/manual/html_node/tar_100.html

I stand corrected.  I didn't know about that.  It seems it's not in
the man page.  Seeing this complexity in tar makes me wonder if it
should be there either.  But, it raises an interesting question: Are
tar's and rsync's --exclude options solving the same task?  If so why
do they need to differ?  If not, why are they similar?

> 
> >It's not that pattern matching for file selection isn't complex --
> >it's just that it's such a well-defined, conceptually simple, common
> >task that other tools (like 'find' and 'bash') handle better than
> >rsync ever will.  And that's the way it should be: it's the unix way.
> 
> I agree that this is something we should be striving for as much as 
> possible: pipeline and offload tasks rather than bloating applications.
> 
> >>If you really need 
> >>complete freedom maybe the way to go is to do your file selection first 
> >>and use --files-from.  
> >
> >
> >Yes, --files-from is nice, and honestly, almost completely sufficient.
> >But in some dynamic cases, you can't keep the list updated.
> 
> Well, maybe we should go back and see if the solution to all problems 
> isn't making --files-from sufficient.  What exactly is missing from it 
> right now?

Good question.

>  The capability to delete files which are not in the 
> files-from list?

There's --include-from and --exclude-from, too.  So, included files
that aren't matched could be deleted, right?

>  Or the remote execution of a command that can generate 
> the files-from list for an rsync server?  

The use of stdin works once at the top level. ie. rsync --files-from=-
a/ b/ < cat myfilelist, but not on a per-directory, changing the rules
for each directory basis.  That's why it might be nice to have the
parsed "rule file" allow the specification of an external command.

> Maybe we ought to really 
> figure out what things cannot be achieved with the current functionality 
> before coming up with something new.

Is there anything?  I'd like to see an example.

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: feedback on rsync-HEAD-20050125-1221GMT

2005-01-28 Thread Chris Shoemaker

On Fri, Jan 28, 2005 at 08:50:10PM -0500, Chris Shoemaker wrote:
> of the "right path", but I won't be convinced until Wayne starts
> *deleting* man page text, because rsync's pattern matching can be
> fully explained in, say, one or two paragraphs.

that should've read "pattern matching _interface_"

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: feedback on rsync-HEAD-20050125-1221GMT

2005-01-28 Thread Chris Shoemaker

On Fri, Jan 28, 2005 at 03:42:25PM -0500, Alberto Accomazzi wrote:
> Chris Shoemaker wrote:
> 
> >If I understand Wayne's design, it would be possible to invent a
> >(per-directory) "hook" rule, whose value is executed, and whose stdout
> >is parsed as a [in|ex]clude file list.  E.g.:
> >
> > -R "cat .rsync-my-includes"
> >
> >or
> >
> > -R "find . -ctime 1 -a ! -fstype nfs -a ! -empty -o iname 'foo*'"
> 
> This is certainly a very powerful mechanism, but it definitely should 
> not be the only way we implement file filtering.  Two problems:
> 
> 1. Sprinkling rule files like these across directories would mean 
> executing external programs all the time for each file to be considered. 

No, only one execution per specified rule.  Most users of this feature
would put specify one rule at the root directory.  But, if a user
wanted to change the rules for every directory, they would have to
specify a rule in each directory.  Then, yes, one execution per
directory.  Presumably they would do this because they actually need
to.  Never one execution per file.

>  This would presumably slow down rsync's execution by an order of 
> magnitude or so and suck the life out of a system doing a big backup job.

If you're referring to process spawning overhead, it's no big deal.
If you're referring to the actual work required to return the file
list, what makes you think that rsync can do it more efficiently than
'cat' or 'find', or whatever tool the user chose?

> 
> 2. Who does actually need such powerful but yet hard-to-handle 
> mechanism?  Most of rsync's users are not programmers, and even us few 
> who are apparently still get confused with rsync's include/exclude 
> logic, forget about even more complicated approaches.

Do you mean include/exclude mechanism or filtering mechanism?  Well,
IMO, parsing a file list is *less* complicated than rsync's custom
pattern specification and include/exclude chaining.  Actually, I think
rsync patterns are /crazy/ complicated and fully deserve the pages
upon pages of documentation, explanation and examples that they get in
the man page.

But, complexity is somewhat subjective, so I won't argue (much) about
it.  In practice, /familiarity/ is far more important than complexity
in a case like this.  Someone who looks at rsync for the first time
has a _zero_ chance of having seen something like rsync's patterns
before, because there is nothing else like them.  (The allusion to GNU
tar's --exclude option which takes only a filename, not a pattern,
isn't really helpful in understanding rsyncs --exclude option.)

OTOH, that same person has a (much) greater than zero chance of
already knowing how to use 'cat' or 'find' or whatever to specify a filelist.
That's good reason to prefer the latter, IMO, even if it is *more*
complex, (which is pretty hard to imagine.)

Don't get me wrong, I recognize that rsync's pattern rules _resemble_
some other things, like regexp or bash-like expansions, but parts are
unique to rsync and there's big difference between "I already know how
to use it" and "I have to spend 45 minutes figuring out what parts
resemble something I already know how to use."

> 
> >IMHO, rsync already has too much of its own "filtering" functionality,
> >and needs less, not more.  But maybe a hook like this that lets users
> >interface with their own filtering program is a step toward
> >deprecating rsync's [in|ex]clude[-from] options.
> >
> >Notice that a generic include and exclude hooks immediately obsoletes
> >the --*-from options and the --*=PATTERN options.  (rsync needs fewer
> >options, ya see? :)
> 
> I totally agree with you.  Having now read the description of the 
> --filter option in CVS's manpage (duh!)  I think what wayne is working on 
> is right on the money and will satisfy 95% of rsync's power users (most 
> of rsync's regular users needs are already met by the current 
> include/exclude rules).

Wayne's too nice.  He gets to actually _maintain_ all of this
complexity in rsync, and he does one helluva job.  If it were me, I
would mercilessly offload all pattern matching to some external
interface and deprecate all (or almost all) of rsync's pattern
matching support.  Since he maintains what he writes, by definition,
he really can't be going wrong.  That said, --filter *may* be my idea
of the "right path", but I won't be convinced until Wayne starts
*deleting* man page text, because rsync's pattern matching can be
fully explained in, say, one or two paragraphs.

It's not that pattern matching for file selection isn't complex --
it's jus

Re: feedback on rsync-HEAD-20050125-1221GMT

2005-01-28 Thread Chris Shoemaker

On Fri, Jan 28, 2005 at 11:25:06AM -0500, Alberto Accomazzi wrote:

> 
> Oooh, I see we are getting a little ambitious, aren't we? ;-)
> [suggestion to use 'find' syntax]

If I understand Wayne's design, it would be possible to invent a
(per-directory) "hook" rule, whose value is executed, and whose stdout
is parsed as a [in|ex]clude file list.  E.g.:

 -R "cat .rsync-my-includes"

or

 -R "find . -ctime 1 -a ! -fstype nfs -a ! -empty -o iname 'foo*'"

IMHO, rsync already has too much of its own "filtering" functionality,
and needs less, not more.  But maybe a hook like this that lets users
interface with their own filtering program is a step toward
deprecating rsync's [in|ex]clude[-from] options.

Notice that a generic include and exclude hooks immediately obsoletes
the --*-from options and the --*=PATTERN options.  (rsync needs fewer
options, ya see? :)

> Wayne Davison wrote:
> 
> >It already supports per-directory name rules, both inherited and not.
> >The idea of having per-directory size and time limits would not be hard
> >to add, and may be quite worthwhile.  For instance, assume 's' is for
> >size and 't' is for the modified time:
> >
> ># Don't transfer files 1 GB or larger
> >s< 1g
> ># Don't transfer files 100 KB or smaller
> >s> 100k
> ># Only transfer new files (modified in the last day)
> >t> yesterday
> >
> >Something like that, perhaps.

We don't really want to reinvent 'find', do we?

> One more thing to point out: I got a core dump when starting a daemon 
> which tried to write to a log file that it had no permission to write 
> to.  The problem seems to be that the function log_open in log.c does 
> not check the return value of fopen.  I don't know whether the right 
> thing to do would be to exit with an error or continue but without 
> logging, but something ought to be changed.

Or... simply log to stderr.  After all, user may prevent daemon's
stderr from being redirected to /dev/null.

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Using rsync to generate diff/patches

2005-01-26 Thread Chris Shoemaker

On Wed, Jan 26, 2005 at 10:57:21AM -0600, Dave Whitinger wrote:
> On Wed, Jan 26, 2005 11:50 AM, Chris Shoemaker wrote:
> 
> > Look at "batch mode."  Except it actually applys changes, too.
> 
> Thank you, Chris!
> 
> That's excellent that most of the code is already written.  All
> that is needed now is something like "dry-run", that will tell it
> not to apply the changes, but only write out the batch file.
> 
> Something like this:
> 
> rsync --write-batch=foo --dry-run -a host:/source/dir/ /adest/dir/

You probably noticed that --write-batch can't currently be used with
--dry-run.  I've considered that this would be useful functionality
for while now.  

> 
> Doable?

It's doable, but not trivial.  It involves a more "complete" dry-run
implementation.  It's probably a nice project for someone getting into
hacking on rsync.  If I had more time for rsync hacking, this is
probably the functionality I'd most like to add.

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Using rsync to generate diff/patches

2005-01-26 Thread Chris Shoemaker

On Wed, Jan 26, 2005 at 07:41:36AM -0600, Dave Whitinger wrote:
> I've been using rsync to remotely backup my MySQL databases
> (totalling over 3 gigabytes) and it works nice and fast.
> 
> Now I'm trying to setup my system to save yesterday's backup
> before applying today's backup, but it's not really feasible to
> keep a 3gig backup for each day, and I'd rather just store diffs
> from the previous day.  This way, I can just go back in time by
> applying the diffs.
> 
> I'm wondering if rsync can be used for this; here's how I
> envision it working:
> 
> 1) The rsync process works just like normal, except instead of
> applying the changes to the destination files, it instead
> generates one big file that contains all the TODO changes.
> 
> 2) The administrator later can run rsync with a certain switch,
> and feed it this generated file through stdin, and those changes
> are applied.
> 
> Great idea, no?

Look at "batch mode."  Except it actually applys changes, too.

-chris
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Shared remote repository

2004-11-27 Thread Chris Shoemaker

On Sat, Nov 27, 2004 at 01:32:58PM +0100, Tony Mobily wrote:
> Hello,
> 
> I have a bit of a problem - I have the feeling the solution is  
> ridicolously trivial, and yet I can't find it!
> 
> I have a remote repository of text files. Until today, I was the only  
> one changing those files: I would simply change them locally, and  
> update the remote repository with this command:
> 
> cd local_dir
> rsync --delete  -e ssh -Llavuz . [EMAIL PROTECTED]:remote_directory
> 
> Now, things are more complicated because I am not the only person  
> modifying those files stored in the remote server anymore.
> So, here is what I'd like to do:
> 
> * Make sure that the remote respository is always the "good", updated  
> copy
> * If I change one of my files locally, the remote repository is updated
> * If I change a remote file, my local file is updated
> * If both the remote file and the local file are changed, then the  
> latest one "wins"
> This would obviously apply to both me and Max (the other person who has  
> access to the files).
> 
> This should make it possible for me and Max to modify files on the  
> remote server OR on the local file system, knowing that the server will  
> always have the latest version of every file.
> 
> Now: is it possible to do this using rsync?

Possible?  Probably, but it doesn't sound like the right tool for the
task.  Have you considered CVS (since you're storing text files)?  Or
unison (since you want to handle updates to either end)?

-chris

> 
> Bye!
> 
> Merc.
> 
>  
> 
> Tony Mobily
> Author of "Hardening Apache" (Apress)
> "...this book can save you pain, humiliation, and hair loss" --  
> Mitchell Pirtle, PHP Magazine 05/2004 
> 
> -- 
> To unsubscribe or change options: 
> http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: reducing memmoves

2004-08-02 Thread Chris Shoemaker

On Mon, Aug 02, 2004 at 10:54:19AM -0700, Wayne Davison wrote:
> 
> - Also removed the (offset > 2*CHUNK_SIZE) check in map_ptr().
>   (Did you leave this in for a reason?)
> 

only because I had no idea why it was there... :)
-chris
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: reducing memmoves

2004-08-02 Thread Chris Shoemaker

On Mon, Aug 02, 2004 at 10:54:19AM -0700, Wayne Davison wrote:
> On Sun, Aug 01, 2004 at 06:16:05PM -0400, Chris Shoemaker wrote:
> > Attached is a patch that makes window strides constant when files are
> > walked with a constant block size.  In these cases, it completely
> > avoids all memmoves.
> 
> Seems like a good start to me.  Here's a patch I created that also makes
> these changes:
> 
> - The map_file() function now takes the window-size directly
>   rather than the block-size.  This lets the the caller choose
>   the value.

Yes, this is good, especially for file_checksum, which can now use a
substantially different polic than the others.

> 
> - Figure out an appropriate window-size for the receiver,
>   sender, generator, and the file_checksum() function to send to
>   map_file().

The modulo checks are good.  Maybe there's someway they can be in one
place instead of three, though.

However, I can't immediately see the reason for the different min and
max window sizes (3x vs. 2x and 16k vs. MAX_MAP_SIZE)

> 
> - Also removed the (offset > 2*CHUNK_SIZE) check in map_ptr().
>   (Did you leave this in for a reason?)
> 
> - The sender now calls map_ptr() with a range of memory that
>   encompasses both the rolling-checksum data and the data at
>   last_match that we may need to reread.
> 
> - Defined MAX_BLOCK_SIZE as a separate value from MAX_MAP_SIZE.

Suggest renaming BLOCK_SIZE to MIN_BLOCK_SIZE and remvoing the report of
this as the "default block-size" in the usage statement.  Maybe with a
comment at the #defines saying that MIN_BLOCK_SIZE can be overridden by
--block-size, and that MAX_MAP_SIZE is a hint since the actual map size
can sometimes be a bit larger.

> 
> - Increased the size of MAX_MAP_SIZE.

Makes sense.  Does it still make sense to limit maximum allowed --block-size?
Afterall, won't the modulo checking always give a map that's big enough,
and should also avoid the pathological memmoves when block-sizes are
large?

> 
> I think this should improve several things.  Comments?

I think improvements on this vein are theoretically sound, but I'm
struggling to measure any "real-world" performance increase.  I have
some pretty wimpy h/w, though.  On the flip side, I'm not aware of any
tests we have to prevent performance regressions.  Perhaps some optional
(since not all h/w would handle them) performance tests would serve both
purposes.
-chris

> 
> ..wayne..
>  
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

reducing memmoves

2004-08-01 Thread Chris Shoemaker

Attached is a patch that makes window strides constant when files are
walked with a constant block size.  In these cases, it completely
avoids all memmoves.

In my simple local test of rsyncing 57MB of 10 local files, memmoved
bytes went from 18MB to zero.

I haven't tested this for a big variety of file cases.  I think that this
will always reduce the memmoves involved with walking a large file, but
perhaps there's some case I'm not seeing.

Also, the memmove cost if obviously neglegible compared to real disk
i/o, so you have pretty much no chance of measuring the difference
unless your files are already starting in cache.

Also, with the new new caps on window size, the worst case
memmoves are quite a bit smaller than they used to be, so the benefit of
avoiding them is comensurately reduced.  Therefore, in order to measure
the difference in terms of actually time to completion, you'd need to be
walking through a lot of cached data.

I don't have enough RAM (I'm at 192MB) to really measure this difference
well.  If you do, feedback from testing is especially welcome.  [glances
in wally's direction]  :)

Overall, I think this should never hurt performance, but with large
datasets and much memory, it should improve performance.

-chris



Index: fileio.c
===
RCS file: /cvsroot/rsync/fileio.c,v
retrieving revision 1.15
diff -u -r1.15 fileio.c
--- fileio.c20 Jul 2004 21:35:52 -  1.15
+++ fileio.c2 Aug 2004 02:31:02 -
@@ -23,6 +23,7 @@
 #include "rsync.h"
 
 extern int sparse_files;
+int total_bytes_memmoved=0;
 
 static char last_byte;
 static int last_sparse;
@@ -182,8 +183,7 @@
 
/* nope, we are going to have to do a read. Work out our desired window */
if (offset > 2*CHUNK_SIZE) {
-   window_start = offset - 2*CHUNK_SIZE;
-   window_start &= ~((OFF_T)(CHUNK_SIZE-1)); /* assumes power of 2 */
+   window_start = offset;
} else {
window_start = 0;
}
@@ -212,6 +212,7 @@
read_offset = read_start - window_start;
read_size = window_size - read_offset;
memmove(map->p, map->p + (map->p_len - read_offset), read_offset);
+   total_bytes_memmoved += read_offset;
} else {
read_start = window_start;
read_size = window_size;
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync "-I" vs. "-c"

2004-08-01 Thread Chris Shoemaker

On Sun, Aug 01, 2004 at 06:20:11PM -0700, Linda A. W. wrote:
> If I use the "-I" to ignore date and size as quick-check methods of 
> determining

just modtime.  rsync never ignores size differences.

> change, what method does it use to determine difference?  If it falls 
> back to
> checksumming the entire file, maybe the manpage might warn that this 
> would be
> as expensive as using the "-c" option...or not depending on what it uses for
> determining difference at that point.

Short answer:  It does fall back to checksum comparison.

Long answer:  I'm not sure, but I suspect that the reason this is not so
explicit in the man page is that it's a bit complicated in the code,
too.  I _think_ that using -I would be even more expensive than -c,
because it will take longer to eventually do the same checksum
comparison.

But, Wayne knows these options like the back of his hand.  Wayne?

-chris

> 
> So exactly how does rsync compare files for differences when date & size are
> used but checksumming is not?
> 
> Thanks!
> -Linda
> -- 
> To unsubscribe or change options: 
> http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: HP-UX 11i and largefiles on rsync 2.6.2

2004-07-29 Thread Chris Shoemaker

On Wed, Jul 28, 2004 at 10:57:58AM -0700, Steve Bonds wrote:
> > HP-UX?
> 
> Alas, no.  The mkstemp man page suggests using tmpfile() instead, which
> generally means that HP won't fix any problems.
> 
> - mktemp(3C)
> Remarks:
>   These functions are provided solely for backward compatibility and
>   importability of applications, and are not recommended for new
>   applications where portability is important.  For portable
>   applications, use tmpfile() instead (see tmpfile(3S)).
> -

Maybe we should follow this advice.

> 
> The tempnam()/tmpfile() combination seems particularly difficult to use
> compared with mkstemp().  I especially liked this warning:
> 
> - tmpnam(3S)
> WARNINGS
>Between the time a file name is created and the file is opened, it is
>possible for some other process to create a file with the same name.
>This can never happen if that other process is using these functions
>or mktemp, and the file names are chosen such that duplication by
>other means is unlikely.
> -

Can we use tmpfile without tmpname?
-chris

> 
> At least HP documents their race conditions, eh?  :(
> 
>   -- Steve
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: HP-UX 11i and largefiles on rsync 2.6.2

2004-07-27 Thread Chris Shoemaker

On Tue, Jul 27, 2004 at 03:23:44PM -0700, Steve Bonds wrote:
> I've been able to duplicate this problem using the CVS version of rsync
> that was current as of about 2000 UTC today (July 27 2004)

That's some good detective work, Steve!
-chris

> 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-22 Thread Chris Shoemaker

On Thu, Jul 22, 2004 at 07:01:12PM -0400, Chris Shoemaker wrote:
> On Thu, Jul 22, 2004 at 06:36:27PM -0700, Wayne Davison wrote:
> > And don't forget the hard-link post-processing -- it would also need to
> > happen after the receiver finished its job.
> 
> Ok, it's late and this new patch seems like the simplest solution of
> all, so there's probably something very wrong with it.  Basic idea:
> move hard-link post-processing and directory permission tweaks from
> generate_files() to recv_files().  Then there's no syncing issues,
> because receiver cleans up when receiver's done.  Uh, right?  :)

Of course, simple isn't so great when it doesn't work.  :)

I see now that the hard-link post-processing looks pretty firmly
attached to the generator, with the 4 hard_link_check() calls in
recv_generator().  I'll look at this some more later.

-chris

> 
> Even with the index notification, this kind of decoupling of recv and
> generator makes sense, don't you think?
> 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-22 Thread Chris Shoemaker

On Thu, Jul 22, 2004 at 06:36:27PM -0700, Wayne Davison wrote:
> On Wed, Jul 21, 2004 at 03:54:11PM -0400, Chris Shoemaker wrote:
> > What data exactly?  I thought:
> > 1) all recv-to-gen communications went through the error_pipe[] fds.
> 
> Yes, that became true when I got rid of the extra pipe that used to
> separate the redo values from the error messages.
> 
> > 2) the only meaningful communications were redo requests and
> > "I'm done".
> 
> It depends on which side is the client.  Since we made the read-batch
> side always the client then the errors/warnings get output to the tty
> directly (for a client sender, which is the default for non-batch local
> transfers, the pipe also contains any error messages from the receiver).
> There is also a patch in the patches dir that sends a "success" message
> for every correctly transferred file (delete-sent-files.diff).
> 
> > I thought we could skip the redos and fake the "I'm done".
> 
> Yes, it's possible to turn off the communication, which is what I was
> talking about -- I was mentioning what things would need to be
> special-cased.
> 
> > Ah, I see what you mean about the directory tweaks.
> 
> And don't forget the hard-link post-processing -- it would also need to
> happen after the receiver finished its job.

Ok, it's late and this new patch seems like the simplest solution of
all, so there's probably something very wrong with it.  Basic idea:
move hard-link post-processing and directory permission tweaks from
generate_files() to recv_files().  Then there's no syncing issues,
because receiver cleans up when receiver's done.  Uh, right?  :)

Even with the index notification, this kind of decoupling of recv and
generator makes sense, don't you think?

> 
> > Just to clarify, I don't have anything against the index notification
> > style gen/recv syncronization.  If you think that's better, then let's
> > go that way.
> 
> In the future there may be a need for data to be communicated back from
> the generator to the receiver (i.e. if the --fuzzy patch get improved
> enough to make it worthwhile), so I think in the long run that having
> the two things running in parallel (like normal) will be the easiest
> thing to maintain.  We can certainly revisit the issue later, but for
> now I have the code in the generator sending the normal index bytes down
> the output pipe (it just avoids sending the checksum data), and some
> special-case read-batch code in the receiver that reads the from-the-
> generator pipe and only proceeds with an update if the generator said
> it was time.
> 
> One interesting thing that fell out of this was the ability to skip any
> part of a batched update that had already been done.  For instance, if
> you run the ./BATCH.sh file and terminate it (for some reason), you can
> run it again and it will skip all the already-performed updates.  (One

That's a pretty compelling feature.  BTW, your index notification scheme
isn't in CVS, is it?

-chris

> exception: if --partial had been specified the interrupted file will
> probably not update correctly, but rsync should tell you when that
> happens.)
> 
> ..wayne..
Index: generator.c
===
RCS file: /cvsroot/rsync/generator.c,v
retrieving revision 1.102
diff -u -r1.102 generator.c
--- generator.c 21 Jul 2004 23:59:25 -  1.102
+++ generator.c 23 Jul 2004 03:37:49 -
@@ -593,8 +593,13 @@
 
write_int(f_out, -1);
 
-   if (preserve_hard_links)
-   do_hard_links();
+   if (verbose > 2)
+   rprintf(FINFO,"generate_files finished\n");
+}
+
+void restore_directory_perms(struct file_list *flist, char *local_name)
+{
+   int i;
 
/* now we need to fix any directory permissions that were
 * modified during the transfer */
@@ -606,6 +611,4 @@
   file, i, -1);
}
 
-   if (verbose > 2)
-   rprintf(FINFO,"generate_files finished\n");
 }
Index: receiver.c
===
RCS file: /cvsroot/rsync/receiver.c,v
retrieving revision 1.96
diff -u -r1.96 receiver.c
--- receiver.c  22 Jul 2004 15:31:06 -  1.96
+++ receiver.c  23 Jul 2004 03:37:50 -
@@ -575,12 +575,17 @@
send_msg(MSG_REDO, buf, 4);
}
}
-   }
+   } /* end of while(1) */
make_backups = save_make_backups;
 
if (delete_after && recurse && !local_name && flist->count > 0)
delete_files(flist);
 
+   if (preserve_hard_links)
+   do_hard_links();
+
+   restore_directory_perms(flist, local_name);
+
if (verbose > 2)
rprintf(FINFO,"recv_files finished\n");
 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-22 Thread Chris Shoemaker

Wayne,

I took a crack a batch-mode test case, to try avoid batch-mode
regressions.  Does this look reasonable?

Also, I came across a confusing typo (I think.)  Also attached.

-chris

Index: testsuite/README.testsuite
===
RCS file: /cvsroot/rsync/testsuite/README.testsuite,v
retrieving revision 1.2
diff -u -r1.2 README.testsuite
--- testsuite/README.testsuite  12 Mar 2002 00:22:22 -  1.2
+++ testsuite/README.testsuite  23 Jul 2004 00:54:58 -
@@ -17,8 +17,8 @@
 path.
 
 If the tests pass, you should see a report to that effect.  Some tests
-require being root or some other precondition, and so will normally be
-checked -- look at the test scripts for more information.
+require being root or some other precondition, and so will normally not
+be checked -- look at the test scripts for more information.
 
 If the tests fail, you will see rather more output.  The scratch
 directory will remain in the build directory.  It would be useful if
#! /bin/sh

# Copyright (C) 2004 by Chris Shoemaker <[EMAIL PROTECTED]>

# This program is distributable under the terms of the GNU GPL (see
# COPYING).

# Test rsync's --write-batch and --read-batch options

. "$suitedir/rsync.fns"

set -x

hands_setup || test_fail "failed to build test directories"

runtest "local --write-batch" 'checkit "$RSYNC -av --write-batch=BATCH \"$fromdir/\" 
\"$todir\"" "$fromdir/" "$todir"'

rm -rf "$todir" || test_fail "failed to remove destination directory"

runtest "--read-batch" 'checkit "$RSYNC -av --read-batch=BATCH \"$todir\"" "$fromdir/" 
"$todir"'

rm -rf "$todir" || test_fail "failed to remove destination directory"

build_rsyncd_conf

RSYNC_CONNECT_PROG="$RSYNC --config=$conf --daemon"
export RSYNC_CONNECT_PROG

runtest "daemon sender --write-batch" "$RSYNC -av --write-batch=BATCH 
rsync://localhost/test-from/  \"$todir\""

rm -rf tmptodir
mv "$todir" tmptodir || test_fail "failed to save copy of destination directory"
runtest "--read-batch from daemon" 'checkit "$RSYNC -av --read-batch=BATCH \"$todir\"" 
"$todir/" tmptodir'

rm -rf "$todir" || test_fail "failed to remove destination directory"
runtest "BATCH.sh use of --read-batch" 'checkit "./BATCH.sh" "$todir" tmptodir'

rm -rf "$todir" || test_fail "failed to remove destination directory"
mkdir $todir || test_fail "failed to restore empty destination directory"
runtest "daemon recv --write-batch" 'checkit "$RSYNC -av --write-batch=BATCH 
\"$fromdir/\" rsync://localhost/test-to" "$todir/" tmptodir'

if [ x"$preserve_scratch" != xyes ]
then
  rm -rf tmptodir BATCH BATCH.sh || test_fail "failed to remove batch files"
fi

# The script would have aborted on error, so getting here means we pass.
exit 0
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-21 Thread Chris Shoemaker

Geesh, I forgot the attachment, (again).

On Wed, Jul 21, 2004 at 03:54:11PM -0400, Chris Shoemaker wrote:
> On Tue, Jul 20, 2004 at 09:10:22AM -0700, Wayne Davison wrote:
> > On Mon, Jul 19, 2004 at 06:18:49PM -0400, Chris Shoemaker wrote:
> 
> > avoid this and also to separate the post-generator processing (the
> > directory tweaks) would have to be delayed until after the receiver
> > post-processing had finished (the --delete-after handling).  I think
> > I'd like to avoid that.
> 
> Ah, I see what you mean about the directory tweaks.  AFAICT, this is
> really easy to fix, though.  I think it's nicer to break that tweaking
> loop into its own function anyway, (independent of solving the read-batch
> gen/recv sync problem.)  I'm attaching a version of this gen/recv
> serialization concept patch with the directory tweak in its own
> function.
> 
> -chris
> 
Index: generator.c
===
RCS file: /cvsroot/rsync/generator.c,v
retrieving revision 1.102
diff -d -u -r1.102 generator.c
--- generator.c 21 Jul 2004 23:59:25 -  1.102
+++ generator.c 22 Jul 2004 00:21:23 -
@@ -596,6 +596,14 @@
if (preserve_hard_links)
do_hard_links();
 
+   if (verbose > 2)
+   rprintf(FINFO,"generate_files finished\n");
+}
+
+void restore_directory_perms(struct file_list *flist, char *local_name)
+{
+   int i;
+
/* now we need to fix any directory permissions that were
 * modified during the transfer */
for (i = 0; i < flist->count; i++) {
@@ -606,6 +614,4 @@
   file, i, -1);
}
 
-   if (verbose > 2)
-   rprintf(FINFO,"generate_files finished\n");
 }
Index: main.c
===
RCS file: /cvsroot/rsync/main.c,v
retrieving revision 1.210
diff -d -u -r1.210 main.c
--- main.c  21 Jul 2004 23:59:31 -  1.210
+++ main.c  22 Jul 2004 00:21:24 -
@@ -473,6 +473,19 @@
 
io_flush(NORMAL_FLUSH);
 
+   if (read_batch) {
+   io_start_buffering_out();
+   set_msg_fd_in(error_pipe[0]);
+   set_msg_fd_out(error_pipe[1]);
+   send_msg(MSG_DONE, "", 0);
+   generate_files(f_out, flist, local_name);
+   recv_files(f_in, flist, local_name);
+   restore_directory_perms(flist, local_name);
+   io_flush(FULL_FLUSH);
+   report(f_in);
+   return 0;
+   }
+   
if ((pid = do_fork()) == 0) {
close(error_pipe[0]);
if (f_in != f_out)
@@ -510,6 +523,7 @@
set_msg_fd_in(error_pipe[0]);
 
generate_files(f_out, flist, local_name);
+   restore_directory_perms(flist, local_name);
 
get_redo_num(); /* Read final MSG_DONE and any prior messages. */
report(-1);
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-21 Thread Chris Shoemaker

On Tue, Jul 20, 2004 at 09:10:22AM -0700, Wayne Davison wrote:
> On Mon, Jul 19, 2004 at 06:18:49PM -0400, Chris Shoemaker wrote:
> > Ok, how about this:  Instead of index notification, run the generator
> > and receiver serially.
> 
> I had wondered about that too, but the problem is that the generator
> expects data from the receiver, so we'd need to add special code to

What data exactly?  I thought:
1) all recv-to-gen communications went through the error_pipe[] fds.
2) the only meaningful communications were redo requests and
"I'm done".

I thought we could skip the redos and fake the "I'm done".  What am I missing?

> avoid this and also to separate the post-generator processing (the
> directory tweaks) would have to be delayed until after the receiver
> post-processing had finished (the --delete-after handling).  I think
> I'd like to avoid that.

Ah, I see what you mean about the directory tweaks.  AFAICT, this is
really easy to fix, though.  I think it's nicer to break that tweaking
loop into its own function anyway, (independent of solving the read-batch
gen/recv sync problem.)  I'm attaching a version of this gen/recv
serialization concept patch with the directory tweak in its own
function.

Just to clarify, I don't have anything against the index notification style
gen/recv syncronization.  If you think that's better, then let's go that
way.  But, I think there should be some expected benefit to rsync as a
whole.  I.e., I don't think read-batch should drive that decision.

My opinion is that the gen/recv serialization is a simpler solution to
read-batch gen/recv sync than the index notification.  (Assuming I haven't
missed something important about gen/recv communication.)  But, more
importantly, what's the simpler solution for all of rsync, in the
long-term?

-chris

> 
> ..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-19 Thread Chris Shoemaker

helps to attach, eh?

On Mon, Jul 19, 2004 at 06:18:49PM -0400, Chris Shoemaker wrote:
> On Sun, Jul 18, 2004 at 08:37:03PM -0700, Wayne Davison wrote:
> > On Sun, Jul 18, 2004 at 06:20:59PM -0400, Chris Shoemaker wrote:
> > > Could a simplified version of this index notification take place over
> > > the existing error-pipe pair?
> > 
> > The data is traveling in the opposite direction for what we need (and
> > it's not bidirectional).
> > 
> 
> Ok, how about this:  Instead of index notification, run the generator
> and receiver serially.
> 
> I've attached a concept patch.  I don't like creating such a
> special-case for read-batch, but it seems like it might solve the
> problem you found.
> 
> OTOH, I can see some benefit to mainlining some sort of g2r index
> notification just as an on-going sanity check.
> 
> -chris
> 
> > ..wayne..
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Index: main.c
===
RCS file: /cvsroot/rsync/main.c,v
retrieving revision 1.208
diff -u -r1.208 main.c
--- main.c  19 Jul 2004 17:11:41 -  1.208
+++ main.c  20 Jul 2004 03:04:53 -
@@ -463,6 +463,18 @@
 
io_flush(NORMAL_FLUSH);
 
+   if (read_batch) {
+   io_start_buffering_out();
+   set_msg_fd_in(error_pipe[0]);
+   set_msg_fd_out(error_pipe[1]);
+   send_msg(MSG_DONE, "", 0);
+   generate_files(f_out, flist, local_name);
+   recv_files(f_in, flist, local_name);
+   io_flush(FULL_FLUSH);
+   report(f_in);
+   return 0;
+   }
+   
if ((pid = do_fork()) == 0) {
close(error_pipe[0]);
if (f_in != f_out)
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-19 Thread Chris Shoemaker

On Sun, Jul 18, 2004 at 08:37:03PM -0700, Wayne Davison wrote:
> On Sun, Jul 18, 2004 at 06:20:59PM -0400, Chris Shoemaker wrote:
> > Could a simplified version of this index notification take place over
> > the existing error-pipe pair?
> 
> The data is traveling in the opposite direction for what we need (and
> it's not bidirectional).
> 

Ok, how about this:  Instead of index notification, run the generator
and receiver serially.

I've attached a concept patch.  I don't like creating such a
special-case for read-batch, but it seems like it might solve the
problem you found.

OTOH, I can see some benefit to mainlining some sort of g2r index
notification just as an on-going sanity check.

-chris

> ..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-19 Thread Chris Shoemaker

On Sun, Jul 18, 2004 at 08:58:43PM -0700, Wayne Davison wrote:
> On Sun, Jul 18, 2004 at 06:20:59PM -0400, Chris Shoemaker wrote:
> > On Sun, Jul 18, 2004 at 05:25:18PM -0700, Wayne Davison wrote:
> > > So, perhaps we should go ahead and save off the exclude list in the
> > > batch file and force read_batch mode to read them?
> > 
> > I'm leaning in this direction.
> 
> I was too until I realized that, in the haste of my last message, I
> had made a mistake about the excludes being needed to limit the dirs,
> symlinks, and devices.  Of course these items were already elided from
> the list the sender sent us, so the only effect the excludes have at
> --read-batch time is to limit what gets deleted by --delete.
> 
> So, with that in mind, I think it would be more flexible to tell the
> user that they can just drop the include/exclude options unless they
> want to use --delete and limit what get deleted.  We can have the
> writing of the BATCH.sh file (which I renamed from BATCH.rsync_argvs)
> automatically dump the exclude options if --delete wasn't specified
> (or if --delete-excluded was).

Ok, I hadn't been paying close attention to the various delete and
exclude features, but I think we're on the same page now.  If I
understand correctly, there's no reason to save the list into the BATCH
file.  It only needs be in BATCH.sh. (I'm glad the 'rsync_argvs' is gone.) 
I'm not sure if it's worth it to exclude the arguments (from BATCH.sh)
when they're not needed.  It might be easier to always dump them.  Is
there much performance impact specifying --excludes when they won't be
used?

-chris


> 
> ..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-18 Thread Chris Shoemaker

On Sun, Jul 18, 2004 at 05:25:18PM -0700, Wayne Davison wrote:
> Seems like the two choices we have are:
> 
> (1)  Force the excludes into the batch file and read them in the local-
> to-local batch-reading transfer.
> 
> (2)  Require the user to re-specify the excludes if they want the same
> update (allowing them to skip them as they see fit).  This route would
> cause the deletes to remove more files than the original transfer if the
> user failed to re-specify the excludes and they used --delete.  Also, if
> there were excluded symlinks, directories, and devices, the generator
> would not know to skip them in the batch-reading run unless the same
> excludes were specified.
> 
> So, perhaps we should go ahead and save off the exclude list in the
> batch file and force read_batch mode to read them?  It should be as
> simple as an extra call to "send_exclude_list(batch_fd);" and the
> addition of a special recv_exclude_list() call for read_batch mode (and
> the removal of the --include/--exclude options out of the argv file).

I'm leaning in this direction.

> One other thing that I noticed is that the synchronization between the
> generator and the receiver is no longer present, so a batch-reading run
> can possibly do some things in the receiver too soon (for instance, if
> the generator hasn't gotten around to creating the required parent dirs
> for the receiver).  There are two solutions to this:

I see.

> 
> (1)  Don't /dev/null the data from the generator, but instead monitor it
> and only let the receiver process a file when its number has been
> requested from the generator.
> 
> (2)  Use another way to convey the same information, like the idea below.
> 
> There is a diff in the patches dir called g2r-basis-filename.diff
> because it sends the name that the generator found for the basis file to
> the receiver via an extra pipe that gets created before the two fork
> (the idea is to avoid duplicating the same basis-file search in the
> receiver and risking having it find a different file than what was used
> to generate the checksums -- something that is particularly useful for
> both the multiple-compare-dest diff and the fuzzy-name matching diff).
> I modified this g2r patch to also convey to the receiver what file-list
> index the name refers to (but only in batch mode) so that the receiver
> can notice when thing aren't quite in sync (i.e. if the batch data
> doesn't have exactly the same items that the generator wants to update).
> 
> One question was prompted by my work on this patch:
> 
> The sending of the extra file-list index value is only enabled in
> batch-reading mode (and  indeed, the extra basis-name pipe is not
> normally turned on unless an option such as --compare-dest or
> --read-batch is specified).  It might be advantageous to always convey
> this extra index-number information from the generator to the receiver
> since it would guard against a receiver that is sending an update that
> the generator didn't request, but I can't think of a reason to do this.
> 
> Any thoughts on any of this?

Could a simplified version of this index notification take place over
the existing error-pipe pair?
-chris

> 
> ..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [Bug 1463] New: poor performance with large block size

2004-07-18 Thread Chris Shoemaker

On Sat, Jul 17, 2004 at 03:54:31AM -0700, Wayne Davison wrote:
> On Fri, Jul 16, 2004 at 08:20:51PM -0400, Chris Shoemaker wrote:
> > On Thu, Jul 15, 2004 at 07:06:28PM -0700, Wayne Davison wrote:
> > > + max_map_size = MIN(MAX_MAP_SIZE, blength * 32);
> > 
> > This makes max_map_size a multiple (32) of blength
> > for a large range  (blength*32 < MAX_MAP_SIZE),
> 
> Oops, that was supposed to be MAX(), not MIN(), otherwise it doesn't
> help the problem of too-many memory moves for large block sizes.  I'll

Sure it does (did) - in the sense that it capped the map sizes and
therefore the memmoves.  I don't think MAX is better, because it forces
large reads even for small block sizes.  That might be ok for the case
where you will walk the whole file, but otherwise it's making a single
map_ptr() pretty expensive.

> go ahead and check that change in for now and look forward to your
> findings on improving the code.
> 
> > ISTM, that the only reason to have the slight lag in window
> > advancement is you expected to frequently service requests where the
> > offset was decreasing just a little.  I didn't see that happening
> > anywhere.  Did I miss something?
> 
> I think the only place where the calls might not advance are in the
> receiver where the map_ptr() calls can be to any block-size-multiple

Agreed, but the partial-stride window advancement in map_ptr only helps
if your subsequent calls are _only_a_little_bit_ behind the current
cursor.  I haven't wrapped my mind around hash_search() yet, so I can't
say for sure, but nothing jumps out as producing that type of access
pattern.

> in the basis file.  It seems strange to me to force a 256K read for
> every basis-file match, so maybe this is something else to look into
> optimizing.

Indeed.
-chris

> 
> ..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [Bug 1463] New: poor performance with large block size

2004-07-16 Thread Chris Shoemaker

On Thu, Jul 15, 2004 at 07:06:28PM -0700, Wayne Davison wrote:
> On Wed, Jul 14, 2004 at 06:27:45PM -0400, Chris Shoemaker wrote:
> > My initial reaction (having not actually read the code) is that it would
> > be desirable make the window_size highly composite, and then ensure that
> > the block size is an integer factor of the window_size.  In other words,
> > avoid the memmoves altogether.
> 
> I don't believe that this is possible since the offset to map_ptr() is
> often incremented by a single byte.

Ah, well, I did say I hadn't actually read the code.  :)

(Now I have.)  Users of map_ptr():

file_checksum() walks with non-overlapping stride of CSUM_CHUNK.
generate_and_send_sums() walks with non-overlapping stride of blength.
simple_send_token() walks with non-overlapping stride of CHUNK_SIZE.
send_deflated_token() walks with non-overlapping stride of CHUNK_SIZE (I
think.)
matched() walks with non-overlapping stride of CHUNK_SIZE.
hash_search() walks a window of size blength (I think) with overlapping
steps of 1.

You must be talking about hash_search().  I hadn't considered that type
of use, only the non-overlapping, constant stride use.  I agree that
playing with window sizes can't prevent hash_search() from triggering
the memmove in map_ptr().

Nevertheless, in may help to optimize the window sizes for the other
five uses.  CSUM_CHUNK == 64, so any left-over during the memmove case is
probably inconsequential.  CHUNK_SIZE == 32k, so that makes me think
that there's a real benefit to making the window size a multiple of 32k.

> I'll attach my version of Craig's fix that increases the MAX_MAP_SIZE
> value for large block sizes.  It also caps the block size that can be
> requested or computed to 256KB (which would make the map buffer in my
> patched version max out at 8MB).
> 
> ..wayne..

> --- fileio.c  16 Jul 2004 01:32:02 -  1.13
> +++ fileio.c  16 Jul 2004 01:58:34 -
> @@ -24,6 +24,8 @@
>  
>  extern int sparse_files;
>  
> +unsigned int max_map_size = MAX_MAP_SIZE;
> +
>  static char last_byte;
>  static int last_sparse;
>  
> @@ -186,7 +188,7 @@ char *map_ptr(struct map_struct *map,OFF
>   } else {
>   window_start = 0;
>   }
> - window_size = MAX_MAP_SIZE;
> + window_size = max_map_size;
>   if (window_start + window_size > map->file_size) {
>   window_size = map->file_size - window_start;
>   }
> --- generator.c   15 Jul 2004 02:20:08 -  1.97
> +++ generator.c   16 Jul 2004 01:58:34 -
> @@ -52,6 +52,7 @@ extern int only_existing;
>  extern int orig_umask;
>  extern int safe_symlinks;
>  extern unsigned int block_size;
> +extern unsigned int max_map_size;
>  
>  extern struct exclude_list_struct server_exclude_list;
>  
> @@ -162,7 +163,9 @@ static void sum_sizes_sqroot(struct sum_
>   c >>= 1;
>   } while (c >= 8);   /* round to multiple of 8 */
>   blength = MAX(blength, BLOCK_SIZE);
> + blength = MIN(blength, MAX_MAP_SIZE);

When we would have blength > MAX_MAP_SIZE, this makes max_map_size a
multiple (1) of blength.  Good.

>   }
> + max_map_size = MIN(MAX_MAP_SIZE, blength * 32);

This makes max_map_size a multiple (32) of blength
for a large range  (blength*32 < MAX_MAP_SIZE),

For those two cases, this is probably optimal.

But what is max_map_size when  MAX_MAP_SIZE/32 < blength < MAX_MAP_SIZE?
Ans:  max_map_size = MAX_MAP_SIZE, and this is still problematic when
e.g. blength = MAX_MAP_SIZE/2 + 1, resulting in memmoves of size
MAX_MAP_SIZE/2 - 1, in the common, non-overlapping stride case.  (It is,
however, problematic over a smaller range of blength then the previous
code was.)

So, while this certainly does a lot to improve the worst-case
performance, I think it leaves a large range of of conditions for which
it is sub-optimal.

There is a bit of a problem of too-many-free-variables here, because 
CHUNK_SIZE==32k, and that's not changing with blength.  Since 32k has no
factor other than 2, if blength turns out to be odd (doesn't really
happen, blength is rounded to 3 bits, I think), then no
max_map_size less than blength*32k can avoid the memmove.  Obviously you
would have to balance the desire not to need so much memory with the
desire to avoid the memmove.  So, you _could_ remove the extra free
varible by making CHUNK_SIZE depend on (be a multiple of) blength, just
like max_map_length.  I don't know what the full impact of that change would
be.

At the other extreme, you could constrain blength to be a power of two.
That makes it a cofactor of CHUNK_SIZE, and max_map_size is optimal (and
of reasonable size) when it is the greater of the two.

In general, max_map_size can't be optimal for

Re: [PATCH] Batch-mode rewrite

2004-07-16 Thread Chris Shoemaker

On Wed, Jul 14, 2004 at 08:07:36PM -0700, Wayne Davison wrote:
> I did some work refining your patch a little, and liked the result so
> much I went ahead and checked it into CVS.  I'd appreciate it if you
> could give my changes a look to see if I messed anything up. 
> 
> The most important changes I made were:
> 
>  - Added some text to the manpage that talks about what options change
>when switching from --write-batch to --read-batch and what options
>must remain the same.

one sec:

what's this part about standard input?

*
dit(bf(--read-batch=FILE)) Apply all of the changes stored in FILE, a
file previously generated by --write-batch.
If em(FILE) is "-" the list will be read from standard input.
*

Does it mean the file name or the file contents?  Either way, this
doesn't work, does it?  Would it be useful?

-chris

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-16 Thread Chris Shoemaker

On Wed, Jul 14, 2004 at 08:07:36PM -0700, Wayne Davison wrote:
> I did some work refining your patch a little, and liked the result so
> much I went ahead and checked it into CVS.  I'd appreciate it if you
> could give my changes a look to see if I messed anything up. 
> 
> The most important changes I made were:
> 
>  - Delay the start of batch writing until after any exclude-list and
>files-from data gets sent.

If I understand your changes, the files-from stuff you're skipping is
only the flagging of the fd, no actual communication.

The exclude-list change raises a question:  I wasn't paying that close
attention to exclude list stuff, but, if the user uses --exclude and
--delete and then applies the batch with the --delete but not the
exclude, what should happen?  I think it's a bit of an abuse batch-mode,
so maybe undefined behavior is ok.  I think it's ok to have the excluded
file not deleted, but that would only happen if the exclude list is
written to the batch file.

However, the exclude list is another one of those protocol dependancies
on server-ness, isn't it?  I see "send_exclude_list(f_out)" in the
both am_sender and !am_sender paths of client, and
"recv_exclude_list(f_in)" in both server paths.   I guess I can see why
it may be easier to leave it out of the batch.  Otherwise, it has to
fall into the same category as the protocol version and checksum_seed.
:(

> 
>  - Reinstated the final report() so that the end-of-transfer summary
>shows up when reading a batch.  I had to make the report() code
>always write out the stats data into the batch file.  Added a big
>comment at the top to help explain what it is doing.

It's a shame to make report() more complicated than it already is (too).
But, this does give the stats for a read-batch, which is nice.  I guess
it's worth it.

>
>  - To make the first 2 items easier on the code, I changed the pipe
>code so that a local-to-local transfer turns off write-batch mode
>in the "am_server" process (this way the server never writes the
>batch, just the client).

Ah, oh well, I guess the un-importance of that decision was bound to
change.

> 
>  - Got rid of the --files-from option from the FOO.rsync_argvs.
> 

makes sense.

>  - Got rid of a host: prefix in the destination arg in FOO.rsync_argvs
>(this got checked in before the new-batch changes since it affected
>the old batch code too).

good catch.

> 
>  - Added a check to the generator so that it doesn't do a lot of
>useless checksum-generating work when --read-batch is specified.

check.
> 
>  - Added some text to the manpage that talks about what options change
>when switching from --write-batch to --read-batch and what options
>must remain the same.

Yes, that's helpful.

> 
>  - Decided not to include your comments on potential protocol changes.

That's probably appropriate.

> 
>  - Removed some unused variables, and other misc. twiddles.
> 
> If you want to see the changes without resorting to CVS, you can look
> here:
> 
> http://www.blorf.net/newbatch.patch
> 

I looked at the patch and what you checked into CVS and it all looks
good to me.  I think what we have will be a lot easier to maintain:

batch.c|  263 -
clientserver.c |3
compat.c   |   11 +-
flist.c|   20 +---
generator.c|4
io.c   |   38 
main.c |  122 +-
match.c|3
options.c  |   24 +
pipe.c |   25 -
proto.h|   18 +--
rsync.1|  117 +++--
rsync.yo   |  111 +++-
sender.c   |  149 +---
token.c|   34 ---
 15 files changed, 326 insertions(+), 616 deletions(-)
 

> Thanks for the nice patches!  I think this will make batch mode much
> nicer.  We can consider marking it as less experimental in the man page
> after some more testing.  (I've done just a little so far.)

Agreed, and thank _you_!

-chris

> 
> ..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Rsync not preserving owner/group

2004-07-14 Thread Chris Shoemaker

On Wed, Jul 14, 2004 at 11:53:56PM -0400, Loukinas, Jeremy wrote:
> For some reason when using -owner -group my files end up being nobody:nobody
> on the destination..?
> This is Solaris 9. 
>  
>  Jeremy S. Loukinas
>  

You need to provide much more information.  command line? version? etc.
-chris

> 
>  
>  
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: A question about connection refused

2004-07-14 Thread Chris Shoemaker

On Fri, Jun 25, 2004 at 02:01:41PM +, Guo jing wrote:
> I install rsync in computer and run it as a daemon successfully, but when I 
> run rsync command on another end to connect it. There is a error.
> 
> The output is :
>   opening tcp connection to 192.168.0.43 port 873
>   rsync: failed to connect to 192.168.0.43: Connection refused
>   _exit_cleanup(code=10, file=clientserver.c, line=93): entered
>   rsync error: error in socket IO (code 10) at clientserver.c(93)
>   _exit_cleanup(code=10, file=clientserver.c, line=93): about to 
> call exit(10)
> 
> I use "netstat -an|grep 873" and  "ps -aux|grep rsync" and I am sure that 
> the rsyncd has run successfully. The version is both 2.6.2. 
> 
> The rsync command is " rsync -aH -v  
> [EMAIL PROTECTED]::system/usr/local/drc-1.0 /home/
> "
> 
>   What's the reason for this error?? Thanks!

What is the command line you are using to start the daemon?
What does rsyncd.conf look like?

-chris

> 
> _
>  MSN Messenger:  http://messenger.msn.com/cn  
> 
> -- 
> To unsubscribe or change options: 
> http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: error in rsync protocol data stream (code 12) at io.c(165)

2004-07-14 Thread Chris Shoemaker

On Tue, Jun 22, 2004 at 10:06:49AM -0400, Linux wrote:
> I get the same error even after taking the advise from the subsequent
> posts to this message and other message stateing "error in rsync
> protocol data stream (code 12) at io.c(165)" . 
> 
> The one thing that i have noticed is that is only errors when the size
> of the directory on the remote machine exceeds 65GB. if it's 66GB the
> transfer will fail half way through. The files i'm transfering are BKF
> files from veritas backup exec and they range from 2-4GB each. In
> addition my --timeout period is set to "4000"
> 

Thank you for reporting this error.  It would be helpful if you could
show the output of a failed run with the -vvv option.

-chris

> 
> So with that said if i keep the folder under 65GB i have no problems
> what so ever. Also both machines taking part in this process are 2.4ghz
> and have 512mb or pc-3200.
> 
> 
> Thanks
> Edward
> 
> 
> 
> 
> On Mon, 2004-06-21 at 13:47, Nick Sylvester wrote:
> > I am recieving a connection unexpectedly closed error while trying to 
> > sync a directory on two machines.  Rsync is in my path on both 
> > machines.  I am using ssh and I can connect to the server fine through 
> > ssh.  I am using a vpn but have no problems connecting to any of the 
> > servers with it.  This is the full error message I get:
> > 
> > Remote Machine: Connection refused
> > rsync: connection unexpectedly closed (0 bytes read so far)
> > rsync error: error in rsync protocol data stream (code 12) at io.c(165)
> > 
> > Here is the command I am using
> > 
> > 
> > rsync -r --timeout=600 [EMAIL PROTECTED] machine:/opt/u91/scripts/sqlpath/* 
> > /home/oracle/sqlpath
> > 
> > I am a new user to rsync and any suggestions or help would be much 
> > appreciated.
> > 
> > Thanks
> > Nick Sylvester
> 
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [Bug 1463] New: poor performance with large block size

2004-07-14 Thread Chris Shoemaker

On Thu, Jun 17, 2004 at 08:47:57PM -0700, Craig Barratt wrote:
> 
> > But, the comment seems to have been right on. I have re-run the
> > experiment with block sizes as small as 3000 (yes it took a long
> > time to complete) all the way up to block sizes of 10 with it
> > working in reasonable times. But, when the block size approaches
> > 170,000 or so, the performance degrades exponentially.
> >
> > I understand that I am testing at the very fringes of what we should
> > expect rsync to do. File sizes of 25Gig and 55Gig are beyond what was
> > originally envisioned (based on 64k hash buckets and a sliding window
> > of 256k).
> 
> Here's a patch to try.  It basically ensures that the window is
> at least 16 times the block size.  Before I'd endorse this patch
> for CVS we need to make sure there aren't cases where map_ptr is
> called with a much bigger length, making the 16x a bit excessive.
> 
> Perhaps I would be tempted to repeat the previous check that the
> window start plus the window size doesn't exceed the file length,
> although it must be at least offset + len - window_start as in
> the original code.

hehe, I'm catching up to you guys.  I'm kinda late.  That's what I get
for letting my email back up for >1 month. :)

> 
> In any case, I'd be curious if this fixes the problem.
> 
> Craig
> 
> --- rsync-2.6.2/fileio.cSun Jan  4 19:57:15 2004
> +++ ../rsync-2.6.2/fileio.c Thu Jun 17 19:33:26 2004
> @@ -193,8 +193,8 @@
> if (window_start + window_size > map->file_size) {
> window_size = map->file_size - window_start;
> }
> -   if (offset + len > window_start + window_size) {
> -   window_size = (offset+len) - window_start;
> +   if (offset + 16 * len > window_start + window_size) {
> +   window_size = (offset + 16 * len) - window_start;
> }

My initial reaction (having not actually read the code) is that it would
be desirable make the window_size highly composite, and then ensure that
the block size is an integer factor of the window_size.  In other words,
avoid the memmoves altogether.

{/me thinks a bit}

Actually, I don't think this removes the pathological case at all.  It just
reduces the frequency of the impact by a factor of 16.  Consider when
len = window_size/16 - 1.  We'll still end up with a memmove of size
len-16, which is expensive when len is large, despite window_size being
16 times larger.

Besides making window_size mod len = 0, another solution could be to use
a circular buffer.

This is a pretty nasty bug you guys found.  I hope we can fix it soon.

-chris

>  
> /* make sure we have allocated enough memory for the window */
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Truly awful rsync docs - Re: real Newbie query sorry!

2004-07-14 Thread Chris Shoemaker

On Thu, Jun 17, 2004 at 10:13:28AM +0100, Stuart Halliday wrote:

> 
> Why isn't there a beginners guide to setting up rsync?
> 
> 
> The rsync front page talks of a tutorial by devshed at:
> http://www.devshed.com/c/b/Administration#/Rsync/page1.html
> 
> Except there isn't one
> 
> It's now: 
> http://www.devshed.com/c/a/Administration/File-Synchronization-With-Rsync/

Stuart's observation is still true.  Who maintains the rsync web site?

> 
> Quite why the official main rsync site would rely on an external 3rd party web site 
> to have their tutorial is not good. External web sites come and go
> 
> It looks like rsync docs were written by a seasoned user years ago in a plain text 
> file and this text file was dumped as a 'HTML file' with no attempt to make it web 
> friendly. It appears to have had syntax errors corrected over a number of years with 
> no attempt made to make it beginner friendly.
> 
> After looking through devshed's article I see there are TWO rsync web site which 
> look very similar.
> http://www.samba.org/rsync/  and http://rsync.samba.org/
> 
> This just confuses me. ;-)

Well, I think they're just the same pages, but maybe a redirect would be
better.

-chris

> 
> If I ever get rsync working, I'll have to write a better manual. :-)
> 
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: stalling during delta processing

2004-07-14 Thread Chris Shoemaker

On Mon, Jun 14, 2004 at 11:03:12PM -0700, Craig Barratt wrote:
> "Wallace Matthews" writes:
> 
> > I copy the 29 Gig full backup back into fedor//test/Kibbutz and issue
> > the command "time rsync -avv --rsh=rsh --stats --block-size=181272
> > /test/Kibbutz/Kbup_1 fedor://test/Kibbutz" and it CRAWLS during delta
> > generation/transmittal at about 1 Megabyte per second.
> >
> > I have repeated the experiment 3 times; same result each time.
> >
> > The only thing that is different is --block-size= option. First,
> > time it isnt specified and I get a predictable answer. Second
> > time, I give it a block size that is about 1/2 of square root of
> > (29 Gig) and that is ok. But, explicitly give it something that
> > is approximately the square root of the 29 Gig and it CRAWLS.
> >
> > When I cancel the command, the real time is 86 minutes and the
> > user time is 84 minutes. This is similar to the issue I reported
> > on Friday that Chris suggested I remove the --write-batch= option
> > and that seemed to fix the CRAWL.
> 
> If I understand the code correctly, map_ptr() in filio.c maintains
> a sliding window of data in memory.  The window starts 64K prior
> to the desired offset, and the window length is 256K.  So your
> block-size of 181272 occupies most of the balance of the window.
> 
> Each time you hit the end of the window the data is memmoved
> and the balance needed is read.  With such a large block size
> there will be a lot of memmoves and small reads.
> 
> I doubt this issue explains the dramatic reduction in speed, but
> it might be a factor.  Perhaps there is a bug with large block
> sizes?
> 

I agree that this shouldn't account for such a slow-down.  Perhaps Wally
can test this hypothesis by using a block size exactly equal to the
window size (256k?)  and one less (256k-1?) and one more (256k+1?).  



> And, yes, your observation about the number of matching blocks
> needs to be explored.

I agree.


> 
> Craig
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: block check sum sizing

2004-07-14 Thread Chris Shoemaker

On Mon, Jun 14, 2004 at 01:35:28PM -0400, Wallace Matthews wrote:
> When I dont specify --block-size but have --write-batch=xxx, I get a xxx.rsync_csum 
> file that is 76 Kbytes in size.
> The size of the file varies as the size of the "reference" file is varied. --stats 
> showed matched data that is roughly 6 block lengths
> based on the square root of the newer file. 
> 
> I copy the original data back to the target directory so that I can repeat the 
> experiment. 
> I compute a block size that is the square root of the size of the "reference" file 
> and use --block-size= the computed size.
> The xxx.rsync_csum file is 12 K bytes in size. The xxx.rsync_delta file is the size 
> of the newer file and --stats shows 0 for matched data.
> 
> I copy the original data back to the target directory. 
> I vary the block size to half the previous example. I rerun the experiment. The 
> xxx.rsync_csum file is still 12 K bytes in size. The xxx.rsync_delta file is still 
> the size of the newer file and --stats shows 0 for matched data.
> 
> This is non intuitive. If I got 6 matched blocks when the square root is 181,272 
> then I would expect to get 6 matched blocks when I specify --block-size = 181,272 
> and 12 when --block-size is 90636. 
> 
> I would also expect to see xxx.rsync_csum size to double when I divide the blocksize 
> by 2.
> 
> What am I missing??

I don't know.  Are you sure you accurately know the block size used when
no block size is forced?  Did you add some output to show the block size
used?  What if you double the block size?  Is there any choice of block
size that induces changes in the size of xxx.rsync_csum and the --stats
matched data?

-chris

> 
> wally
> 
> --
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: I need help rsyncing Local Disks

2004-07-14 Thread Chris Shoemaker

On Mon, Jun 14, 2004 at 09:39:04AM -0700, Marshall28 wrote:
> Recently I've found out about rsync and wanted to use this to mirror local
> disks on one of my servers. I first ran Ghost for Linux to get the exact
> clone I was looking for, and now I'm ready to setup rsync to keep my drives
> mirrored on a continual basis. Here's my setup:
> 
> 1 Seagate 4.6GB SCSI on /dev/sda, mounted like this:
>   /dev/sda1 ==> /boot - 101M
>   /dev/sda2 ==> swap - 269M
>   /dev/sda3 ==> / - 3.9GB
> 
> 1 Seagate 16GB SCSI on /dev/sdb has the same look due to the ghost for linux
> clone but I need help on how I should keep this drive sync'd with the main
> SCSI disk, sda. I saw the script on rsync.samba.org for backing up to a
> spare disk but need some help understanding the coding, as well as knowing
> if this script will provide me with another spare disk that is a replica of
> the primary. From what I can make of this script it doesn't mirror the
> drive, it only mirrors these: rootfs, usr, data, and data2.
> 
> If this script does replicate disks then he first part of it is really what
> I'd be looking to do. In my case I would need to mount /dev/sdb1 as "/boot"
> and /dev/sdb3 as "/" somewhere. Whatever tips/insight you guys can provide
> on this would be much appreciated. Here's the script:

rsync operates on filesystems, not partitions nor disks.  So, yes, you
would need to mount the partitions somewhere, e.g. /mnt/oldroot,
/mnt/oldboot, and rsync between new directories and old.

-chris


> 
> #!/bin/sh
> 
> export PATH=/usr/local/bin:/usr/bin:/bin
> 
> LIST="rootfs usr data data2"
> 
> for d in $LIST; do
>   mount /backup/$d
>   rsync -ax --exclude fstab --delete /$d/ /backup/$d/
>   umount /backup/$d
> done
> 
> 
> 
> thanks
> marshall
> 
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: what am I doing wrong

2004-07-14 Thread Chris Shoemaker

On Fri, Jun 11, 2004 at 03:51:22PM -0400, Wallace Matthews wrote:
> Thanks for the suggestion. It works fine if I remove the --write-batch from the 
> command line. 
> That should narrow it down for the bug fixer(s). I know that --write-batch works ok 
> when the reference file is on a remote system. 
> 
> It means that until there is a fix, I have no use for the local only case. My only 
> purpose for using it was to create delta files that I could then send to remote 
> system(s) to create incrementals.
> 

Wally,
You may want to try out the CVS version of rsync with my recent
batch-mode rewrite patch.  It has a little different (better) interface,
and it does work in the case you describe.
It would be interesting to see the results of your block-size
measurements.

-chris

> wally 
> 
> -Original Message-
> From: Chris Shoemaker [mailto:[EMAIL PROTECTED]
> Sent: Friday, June 11, 2004 10:31 AM
> To: Wallace Matthews
> Cc: [EMAIL PROTECTED]
> Subject: Re: what am I doing wrong
> 
> 
> On Fri, Jun 11, 2004 at 02:53:53PM -0400, Wallace Matthews wrote:
> > I am seeing some rather strange behavior with synch of 2 directories on the same 
> > system using 2.6.2.
> > 
> > The older file is the image of a full backup and is 29Gig in size. The new image 
> > is a slice of an incremental
> > backup and is 101Meg in size.
> > 
> > the command line is:
> > time /home/wally/rsync/rsync-2.6.2 -av --rsh=rsh --backup --stats 
> > --block-size= --write-batch=kbup1aaa /test/Kibbutz/Kbup_1.aaa 
> > /test/Kibbutz/work 
> > 
> > What I am observing in /test/Kibbutz/work is a file .Kbup_1.aaa.AZVyuT that is 35 
> > Meg in size after an overnight run that has been going on for 14 hours. When I 
> > kill the job, I get real 817m10.062s and user 814m45.940s sys 7m23.870s. 
> > 
> > I have tried this without the --block-size statement and it goes pretty fast but 
> > the literal data is 104M with no matches.
> > 
> > I have tried it for a variety of --block-size= and it always stalls with very 
> > high user times.
> > 
> > If I make the destination fedor://test/Kibbutz with a copy of the 29G file in the 
> > destination directory, it takes about 30m of real time and 9m of user time. 
> > 
> > It seems to be specific to source and destination being on the same system. 
> > 
> > Would either Wayne or Tim give me some insight into what I am doing to screw up 
> > rsync so badly??
> 
> Do you observe the same behavior without "write-batch"?
>   -chris
> 
> > 
> > I did similar experiments with 2.5.7 in January and didnt see behavior like this, 
> > but at that time my full backup images were only 100 Meg or so and my incremental 
> > backups were about 10 Meg. 
> > 
> > I was experimenting with building the deltas locally and distributing them with a 
> > download server for expansion of the remote targets.
> > 
> > wally
> > 
> > --
> > To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Rsync Mirroring Problems

2004-07-14 Thread Chris Shoemaker

On Thu, Jun 03, 2004 at 09:14:55AM +1000, Dan Goodes wrote:
> Hi Again,
> 
> On Thu, 27 May 2004 at 09:52, Dan Goodes wrote:
> 
> > Hi Folks,
> >
> > For some time, we've been having some issues with our mirroring with
> > rsync. The symptoms are a broken transfer, with the 'cryptic' error
> > message:
> >
> > rsync: connection unexpectedly closed (1128806 bytes read so far)
> 
> Following the feedback I've gotten so far - it appears that the problem is
> some incompatibilities between different versions of rsync. It also
> appears that the problem is that the server end is 'going away' before the
> rsync has finished (perhaps due to a segfault). Here are a couple of
> examples of rsync command-lines for which this happens regularly.

I know it's been over a month, but... Are you still having problems?

> 
> /usr/bin/rsync -rltvH --stats --delete 
> download.fedora.redhat.com::fedora-linux-core-updates/ 
> /pub/fedora/linux/core/updates/
> 

I can't run this one.  Perhaps download.fedora.redhat.com isn't an rsync
server?

[EMAIL PROTECTED] test2]$ telnet download.fedora.redhat.com 873
Trying 66.187.224.20...
Connected to download.fedora.redhat.com (66.187.224.20).
Escape character is '^]'.
Connection closed by foreign host.

> /usr/bin/rsync -rltvH --stats --delete --max-delete=500 mirror.caosity.org::cAos/ 
> /ftp/pub/caosity/
> 

this one seems to be up.  Can you still reproduce this?  The server appears
to be using protocol version 26:

[EMAIL PROTECTED] test2]$ telnet mirror.caosity.org 873
Trying 69.56.240.122...
Connected to mirror.caosity.org (69.56.240.122).
Escape character is '^]'.
@RSYNCD: 26

There were quite a few releases with protocol version 26, so this server
could potentially be quite old.  You might try asking the admin to
upgrade rsync.

Otherwise, you might also try rsyncing the sub-directories individually.
That might avoid an issue with duplicate filenames, or it might narrow
down the failures to one directory.

Good luck.

-chris

> You get the idea. Obviously without access to the server logs on these
> machines, I can't tell what exactly happened at the precise time that the
> rsync failed.
> 
> Any further assistance would be apprecaited.
> 
> Thanks
> 
> -Dan
> 
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Rsync Problems, Possible Addressed Bug?

2004-07-14 Thread Chris Shoemaker

On Wed, Jul 14, 2004 at 04:55:31PM -0400, Robert Caskey wrote:
> I got this mail from a cronjob and can't figure out what is causing 
> rsync to crap out on me. I received the message at 5:03, when the cron 
> job is scheduled to run at 4:00, so total runtime is approximately an 
> hour.
> 
> Machine that is fetching the files is a low-end G3 running Yellowdog 
> 3.01, rsync --version 2.5.5. It has only 128 megs of ram, but isn't 
> doing anything else. Transfers are CPU bound.
> 
> It syncs approximately 120 gigs of files over ssh from various 
> machines, but has difficulty with a Mac OS X.3 server machine running 
> rsync 2.5.7, which is also the largest host with over 100 gigs of files 
> to be backed up.  Is there a known issue with these versions of r-sync?
> 
> Anyone here have any ideas? I really don't sleep well at night when my 
> nightlys don't go well.
> 
> Thanks,
> --Rob
> 
> 
> Subject: Cron <[EMAIL PROTECTED]> run-parts /etc/cron.daily
> 
> 
> /etc/cron.daily/backup_ribs:
> 
> -= Beginning backups for calendar
> 
> receiving file list ... done
> Public Event Server/
> Public Event Server/Public Event Server.db
> rsync[6790] (receiver) heap statistics:
>   arena: 118120   (bytes from sbrk)
>   ordblks:5   (chunks not in use)
>   smblks: 2
>   hblks:  0   (chunks from mmap)
>   hblkhd: 0   (bytes from mmap)
>   usmblks:0
>   fsmblks:   80
>   uordblks:  101704   (bytes used)
>   fordblks:   16416   (bytes free)
>   keepcost:3352   (bytes in releasable chunk)
> 
> Number of files: 2
> Number of files transferred: 1
> Total file size: 6883446 bytes
> Total transferred file size: 6883446 bytes
> Literal data: 18546 bytes
> Matched data: 6864900 bytes
> File list size: 120
> Total bytes written: 59158
> Total bytes read: 8517
> 
> wrote 59158 bytes  read 8517 bytes  3301.22 bytes/sec
> total size is 6883446  speedup is 101.71
> 
> Additional information:
> total 48
> lrwxrwxrwx1 root root   37 Jun  2 04:02 current -> 
> /mnt/storage/backups/calendar/daily.0
> drwxr-xr-x3 root root 4096 Jul 13 04:02 daily.0
> drwxr-xr-x3 root root 4096 Jul 12 17:04 daily.1
> drwxr-xr-x3 root root 4096 Jul 12 07:55 daily.2
> drwxr-xr-x3 root root 4096 Jul 12 05:13 daily.3
> drwxr-xr-x3 root root 4096 Jul 12 04:02 daily.4
> drwxr-xr-x3 root root 4096 Jul 11 04:02 daily.5
> drwxr-xr-x3 root root 4096 Jul 10 04:02 daily.6
> drwxr-xr-x3 root root 4096 Jul  1 04:42 monthly.0
> drwxr-xr-x3 root root 4096 Jul 11 04:22 weekly.0
> drwxr-xr-x3 root root 4096 Jul  4 04:22 weekly.1
> drwxr-xr-x3 root root 4096 Jun 27 04:22 weekly.2
> drwxr-xr-x3 root root 4096 Jun 20 04:22 weekly.3
> FilesystemSize  Used Avail Use% Mounted on
> /dev/hda3 5.7G  547M  4.8G  10% /
> none   30M 0   30M   0% /dev/shm
> /dev/hdb2 126G   40G   79G  34% /mnt/storage
> 
> -= Beginning backups for db
> 
> ssh: connect to host db.music.uga.edu port 22: No route to host
> rsync: connection unexpectedly closed (0 bytes read so far)
> rsync error: error in rsync protocol data stream (code 12) at io.c(150)

This error is pretty clear.  I don't think you're asking about this one,
but I notice that your script is showing its own error report, probably
by checking the return status of rsync, but...

> 
> WARNING: there seems to have been an rsync error.
> Check the logs for more information.
> 
> -= Beginning backups for figaro
> 
> receiving file list ... done
> mailman/archives/private/faculty-announce/
> mailman/locks/
> www/html/tools/labs/
> etc/mail/statistics
> mailman/archives/private/faculty-announce/2004-July.txt.gz
> www/html/syllabus/temp/MUSI_1810_zerkeld_200108.pdf
> www/html/tools/labs/current_index.php
> rsync[6805] (receiver) heap statistics:
>   arena:6544744   (bytes from sbrk)
>   ordblks: 2673   (chunks not in use)
>   smblks: 1
>   hblks:  1   (chunks from mmap)
>   hblkhd:258048   (bytes from mmap)
>   usmblks:0
>   fsmblks:   48
>   uordblks: 4048432   (bytes used)
>   fordblks: 2496312   (bytes free)
>   keepcost:2832   (bytes in releasable chunk)
> 
> Number of files: 33769
> Number of files transferred: 4
> Total file size: 362384239 bytes
> Total transferred file size: 18814 bytes
> Literal data: 4064 bytes
> Matched data: 14750 bytes
> File list size: 557394
> Total bytes written: 236
> Total bytes read: 560149
> 
> wrote 236 bytes  read 560149 bytes  38647.24 bytes/sec
> total size is 362384239  speedup is 646.67
> 
> WARNING: there seems to have been an rsync error.
> Check the logs for more information.
> 

This run seems to have completed successfully, but your script still
complains.  Maybe your

Re: [PATCH] Batch-mode rewrite, update to man page

2004-07-14 Thread Chris Shoemaker

I've attached an update to the man page regarding batch mode.  I didn't
change the statement about batch mode being experimental, but maybe we
should consider modifying it.
It did serve well to manage my expectations when I first tried batch
mode and found that it didn't work at all for use with remote servers.
Unfortunately, I think "experimental" was a euphemism for "easily broken".
After my re-write, I'm hoping batch-mode will be more stable and
easier to maintain.  (But, maybe it just has new bugs. ;-)  Of course, it is
still "experimental" in that it hasn't been widely tested.
I'd probably be in favor of adding some encouraging statement to
the "experimental" warning, so that people aren't scared away from it.
Maybe something like:  "However, the existing behavior and interface
is a candidate for stable functionality."

The reason I think this may be in order is that I think the
concept of operation has been well-defined for quite a while.  It was
only the implementation that was "experimental".  The new implementation
really is simpler, and hopefully more robust.

Any thoughts?

-chris
Index: rsync.yo
===
RCS file: /cvsroot/rsync/rsync.yo,v
retrieving revision 1.171
diff -u -r1.171 rsync.yo
--- rsync.yo5 Jun 2004 16:16:30 -   1.171
+++ rsync.yo14 Jul 2004 22:33:48 -
@@ -347,8 +347,8 @@
  --log-format=FORMAT log file transfers using specified format
  --password-file=FILEget password from FILE
  --bwlimit=KBPS  limit I/O bandwidth, KBytes per second
- --write-batch=PREFIXwrite batch fileset starting with PREFIX
- --read-batch=PREFIX read batch fileset starting with PREFIX
+ --write-batch=FILE  write a batch to FILE 
+ --read-batch=FILE   read a batch from FILE
  --checksum-seed=NUM set block/file checksum seed
  -4  --ipv4  prefer IPv4
  -6  --ipv6  prefer IPv6
@@ -897,13 +897,13 @@
 result is an average transfer rate equaling the specified limit. A value
 of zero specifies no limit.
 
-dit(bf(--write-batch=PREFIX)) Generate a set of files that can be
-transferred as a batch update. Each filename in the set starts with
-PREFIX. See the "BATCH MODE" section for details.
-
-dit(bf(--read-batch=PREFIX)) Apply a previously generated change batch,
-using the fileset whose filenames start with PREFIX. See the "BATCH
-MODE" section for details.
+dit(bf(--write-batch=FILE)) Record a file that can later be applied to
+anonther identical destination with --read-batch. See the "BATCH MODE"
+section for details.
+
+dit(bf(--read-batch=FILE)) Apply all of the changes stored in FILE, a
+file previously generated by --write-batch. See the "BATCH MODE"
+section for details.
 
 dit(bf(-4, --ipv4) or bf(-6, --ipv6)) Tells rsync to prefer IPv4/IPv6
 when creating sockets.  This only affects sockets that rsync has direct
@@ -917,16 +917,12 @@
 dit(bf(--checksum-seed=NUM)) Set the MD4 checksum seed to the integer
 NUM.  This 4 byte checksum seed is included in each block and file
 MD4 checksum calculation.  By default the checksum seed is generated
-by the server and defaults to the current time(), or 32761 if
-bf(--write-batch) or bf(--read-batch) are specified.  This option
+by the server and defaults to the current time().  This option
 is used to set a specific checksum seed, which is useful for
 applications that want repeatable block and file checksums, or
 in the case where the user wants a more random checksum seed.
 Note that setting NUM to 0 causes rsync to use the default of time()
-for checksum seed.  Note also that bf(--write-batch) and bf(--read-batch)
-set the checksum seed to 32761, so bf(--checksum-seed=NUM) needs to
-follow these options if you want to specify a different checksum
-seed in batch mode.
+for checksum seed.
 
 enddit()
 
@@ -1107,53 +1103,45 @@
 hosts. In order to do this using batch mode, rsync is run with the
 write-batch option to apply the changes made to the source tree to one
 of the destination trees.  The write-batch option causes the rsync
-client to store the information needed to repeat this operation against
-other destination trees in a batch update fileset (see below).  The
-filename of each file in the fileset starts with a prefix specified by
-the user as an argument to the write-batch option.  This fileset is
-then copied to each remote host, where rsync is run with the read-batch
-option, again specifying the same prefix, and the destination tree.
-Rsync updates the destination tree using the information stored in the
-batch update fileset.
+client to store in a "batch file" all the information needed to repeat
+this operation against other, identical destination trees.
 
-The fileset consists of 4 files:
+To apply the recorded changes to another destination tree, run rsync
+with the read-batch option, specifying the name of the same batch
+file, and th

Re: [PATCH] Batch-mode rewrite

2004-07-14 Thread Chris Shoemaker

There it goes...

On Wed, Jul 14, 2004 at 12:16:45AM -0700, Wayne Davison wrote:
> On Tue, Jul 13, 2004 at 04:40:39PM -0400, Chris Shoemaker wrote:
> > Do you see any reason to keep FIXED_CHECKSUM_SEED around?  It doesn't
> > hurt anthing, but I don't see a use for it.
> 
> You're right -- the new batchfile setup works fine without hard-wiring a
> checksum_seed value since the checksum is in the batchfile.  It has even
> been suggested previously that a hard-wired value could be a bad thing
> in certain circumstances, so I'm all in favor of getting rid of it.
> 
> ..wayne..
Index: options.c
===
RCS file: /cvsroot/rsync/options.c,v
retrieving revision 1.157
diff -u -r1.157 options.c
--- options.c   20 Jun 2004 19:47:05 -  1.157
+++ options.c   14 Jul 2004 22:27:59 -
@@ -133,7 +134,6 @@
 int always_checksum = 0;
 int list_only = 0;
 
-#define FIXED_CHECKSUM_SEED 32761
 #define MAX_BATCH_PREFIX_LEN 256   /* Must be less than MAXPATHLEN-13 */
 char *batch_prefix = NULL;
 
@@ -571,13 +571,11 @@
case OPT_WRITE_BATCH:
/* popt stores the filename in batch_prefix for us */
write_batch = 1;
-   checksum_seed = FIXED_CHECKSUM_SEED;
break;
 
case OPT_READ_BATCH:
/* popt stores the filename in batch_prefix for us */
read_batch = 1;
-   checksum_seed = FIXED_CHECKSUM_SEED;
break;
 
case OPT_TIMEOUT:
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-13 Thread Chris Shoemaker

Wayne,
Do you see any reason to keep FIXED_CHECKSUM_SEED around?  It doesn't
hurt anthing, but I don't see a use for it.
-chris
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-12 Thread Chris Shoemaker

On Mon, Jul 12, 2004 at 02:47:30PM -0400, Chris Shoemaker wrote:
> 
> Ok.   "diff -cu" it is.  I used -b because the auto-tab feature
> in emacs sometimes causes noisy whitespace changes in the diff.
> 
> I'll incorporate your comments and rediff.
>

Ok, actually it seems "diff -cu" isn't right.  Instead I just used "diff
-u"

The attached patch attempt to also remove vestigial batch-mode stuff so
it's a bit longer.

Your feedback greatly simplified the checksum-seed and protocol-version
special cases.  Thanks for pointing those things out.

Tomorrow, I'll start working on updating the man page.

-chris

Index: batch.c
===
RCS file: /cvsroot/rsync/batch.c,v
retrieving revision 1.32
diff -u -r1.32 batch.c
--- batch.c 15 May 2004 19:31:10 -  1.32
+++ batch.c 13 Jul 2004 04:41:42 -
@@ -13,50 +13,8 @@
 extern int protocol_version;
 extern struct stats stats;
 
-struct file_list *batch_flist;
-
-static char rsync_flist_file[] = ".rsync_flist";
-static char rsync_csums_file[] = ".rsync_csums";
-static char rsync_delta_file[] = ".rsync_delta";
 static char rsync_argvs_file[] = ".rsync_argvs";
 
-static int f_csums = -1;
-static int f_delta = -1;
-
-void write_batch_flist_info(int flist_count, struct file_struct **files)
-{
-   char filename[MAXPATHLEN];
-   int i, f, save_pv;
-   int64 save_written;
-
-   stringjoin(filename, sizeof filename,
-   batch_prefix, rsync_flist_file, NULL);
-
-   f = do_open(filename, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
-   if (f < 0) {
-   rsyserr(FERROR, errno, "Batch file %s open error", filename);
-   exit_cleanup(1);
-   }
-
-   save_written = stats.total_written;
-   save_pv = protocol_version;
-   protocol_version = PROTOCOL_VERSION;
-   write_int(f, protocol_version);
-   write_int(f, flist_count);
-
-   for (i = 0; i < flist_count; i++) {
-   send_file_entry(files[i], f,
-   files[i]->flags & FLAG_TOP_DIR ?  XMIT_TOP_DIR : 0);
-   }
-   send_file_entry(NULL, f, 0);
-
-   protocol_version = save_pv;
-   stats.total_written = save_written;
-
-   close(f);
-}
-
-
 void write_batch_argvs_file(int argc, char *argv[])
 {
int f;
@@ -116,216 +74,6 @@
close(f);
 }
 
-struct file_list *create_flist_from_batch(void)
-{
-   char filename[MAXPATHLEN];
-   unsigned short flags;
-   int i, f, save_pv;
-   int64 save_read;
-
-   stringjoin(filename, sizeof filename,
-   batch_prefix, rsync_flist_file, NULL);
-
-   f = do_open(filename, O_RDONLY, 0);
-   if (f < 0) {
-   rsyserr(FERROR, errno, "Batch file %s open error", filename);
-   exit_cleanup(1);
-   }
-
-   batch_flist = flist_new(WITH_HLINK, "create_flist_from_batch");
-
-   save_read = stats.total_read;
-   save_pv = protocol_version;
-   protocol_version = read_int(f);
-
-   batch_flist->count = read_int(f);
-   flist_expand(batch_flist);
-
-   for (i = 0; (flags = read_byte(f)) != 0; i++) {
-   if (protocol_version >= 28 && (flags & XMIT_EXTENDED_FLAGS))
-   flags |= read_byte(f) << 8;
-   receive_file_entry(&batch_flist->files[i], flags, batch_flist, f);
-   }
-   receive_file_entry(NULL, 0, NULL, 0); /* Signal that we're done. */
-
-   protocol_version = save_pv;
-   stats.total_read = save_read;
-
-   return batch_flist;
-}
-
-void write_batch_csums_file(void *buff, int bytes_to_write)
-{
-   if (write(f_csums, buff, bytes_to_write) < 0) {
-   rsyserr(FERROR, errno, "Batch file write error");
-   close(f_csums);
-   exit_cleanup(1);
-   }
-}
-
-void close_batch_csums_file(void)
-{
-   close(f_csums);
-   f_csums = -1;
-}
-
-
-/**
- * Write csum info to batch file
- *
- * @todo This will break if s->count is ever larger than maxint.  The
- * batch code should probably be changed to consistently use the
- * variable-length integer routines, which is probably a compatible
- * change.
- **/
-void write_batch_csum_info(int *flist_entry, struct sum_struct *s)
-{
-   size_t i;
-   int int_count;
-   char filename[MAXPATHLEN];
-
-   if (f_csums < 0) {
-   stringjoin(filename, sizeof filename,
-   batch_prefix, rsync_csums_file, NULL);
-
-   f_csums = do_open(filename, O_WRONLY | O_CREAT | O_TRUNC,
-   S_IRUSR | S_IWUSR);
-   if (f_csums < 0) {
-   rsyserr(FERROR, errno, "Batch file %s open error",
-   filename);
-

Re: [PATCH] Batch-mode rewrite

2004-07-12 Thread Chris Shoemaker

On Mon, Jul 12, 2004 at 07:11:04PM -0700, Wayne Davison wrote:
> On Mon, Jul 12, 2004 at 02:47:30PM -0400, Chris Shoemaker wrote:
> > On Mon, Jul 12, 2004 at 12:34:38PM -0700, Wayne Davison wrote:
> > > Another thing I noticed was that a local --write-batch copy behaved as
> > > if --whole-file had been specified.
> > 
> > Hmm, I forgot about that.  Q: Shouldn't this be set in parse_arguments?
> > A:  Oh yeah, that's right, we don't know that we're local until we parse
> > the src and dest args.  :-(  I guess it DOES matter which patch the
> > write-batch=0 is on.
> 
> I've just checked-in a change that makes sure that the whole_file value
> is set properly in the local-copy-forces-whole-file special case before
> the generator is forked (which simplifies the code a bit from what it
> was).  So, you should be able to put this write_batch=0 code back where
> you had it, if you like.

That's nice.  I don't see any advantage of one path over the other.  But
it is nice to keep the comment that says it doesn't matter.

BTW, batch mode seems to work fine with compression.  Patch at 1am.

-chris

> 
> ..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-12 Thread Chris Shoemaker

Wayne,
A couple more thoughts:

On Mon, Jul 12, 2004 at 12:34:38PM -0700, Wayne Davison wrote:
> First, a summary of my thoughts:
> 
> One thought here:  would it make things simpler to separate the option-
> parsing variables (read_batch & write_batch) from a set of variables
> that would indicate that the mode is currently active (for instance,
> "read_batch_enabled" and "write_batch_enabled")?  This might allow you
> to remove the hack from start_inband_exchange() IFF the recording of the
> protocol can be turned on at a single common point (perhaps with an
> exception to write out the version number).  I didn't look to see how
> much starting and stopped is needed, though, so this idea may not be of
> any use.

I followed this suggestion and it worked.  Then I followed your
suggestion below about write_batch_monitor_{in|out}.  I think the latter
covers all the bases, removing the need for a "batch_enabled" flag,
since the "if (fd==write_batch_monitor{in|out})" check is happening
anyway.

> I'm thinking these checks might be safer if some init code did this:
> 
> int write_batch_monitor_in = -1;
> int write_batch_monitor_out = -1;
> 
>   if (write_batch) {
>   if (am_sender)
>   write_batch_monitor_out = f_out;
>   else
>   write_batch_monitor_in = f_in;
>   }
> 
> Then the code in readfd() could be "if (fd == write_batch_monitor_in)"
> and the code in writefd() could be "if (fd == write_batch_monitor_out)".
> Thus, the code could never record any I/O to the wrong fd.  For
> instance, a diff in the patches dir makes the receiver read data on
> a pipe from the generator, and we wouldn't want that going into the
> batch file (when the receiver was the one writing the batch file).

I agree that this scheme is better, since it allows selectivity w.r.t.
the particular stream captured.  However, it may not be much "safer".
Afterall, couldn't the forked generator reuse the integer fds for a
different stream.  Therefore, in either case, to be "safe" don't I need
to explicitly disable batch mode for the generator (or any children
whose streams I don't want to capture?)

-chris
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] Batch-mode rewrite

2004-07-12 Thread Chris Shoemaker

On Mon, Jul 12, 2004 at 12:34:38PM -0700, Wayne Davison wrote:
> First, a summary of my thoughts:
> 
> This looks to be a much simpler way to integrate batch support into
> rsync than what we currently have.  I'm quite interested to see this
> refined further.  Nice work!
> 
Thanks for your thorough review and quick feedback!


> Some other comments:
> 
> On Sun, Jul 11, 2004 at 06:08:04PM -0400, Chris Shoemaker wrote:
> > 1) I suspect one area in client_run() is non-portable.
> 
> I assume you mean the use of /dev/null.  That idiom is used elsewhere in
> rsync, so it should be safe to use it in one more place.

Yes, that's what I meant.  That's good to know.  I didn't think to grep
for other usage.  I'll remove the comment.

> 
> > If you are open to some protocol changes with the motivation of
> > unifying the client/server protocol with the sender/receiver protocol,
> > then I can write something up.
> 
> I think it might be best to weigh the benefits of what is gained by the
> unification compared with the added complexity of maintaining both the
> old and new protocol methods in the code for years to come.  My initial
> reaction is to leave it alone, but feel free to argue for changes that
> you believe in.

I agree about weighing cost and benefit.  I'll continue to consider
this.

> 
> > ! /* start_inband_exchange() contains an unfortunate write_batch
> > !  * hack/workaround.  The issue here is that the protocol for version
> > !  * exchange differs when an rsyncd server is involved.  However, the
> > !  * batch file written must be the same whether a server is involved or
> > !  * not.  If only version exchange always used the same protocol...
> > !  */ 
> 
> One thought here:  would it make things simpler to separate the option-
> parsing variables (read_batch & write_batch) from a set of variables
> that would indicate that the mode is currently active (for instance,
> "read_batch_enabled" and "write_batch_enabled")?  This might allow you
> to remove the hack from start_inband_exchange() IFF the recording of the
> protocol can be turned on at a single common point (perhaps with an
> exception to write out the version number).  I didn't look to see how
> much starting and stopped is needed, though, so this idea may not be of
> any use.

You're probably right, although I think one "batch_enabled" would
suffice.  I'll look for the common point.

> Seems to me this would be simpler to disable write_batch prior to this
> block of code, and then add a single "if" after this block that always
> calls "write_int(batch_fd, checksum_seed)" if we're writing a batch:
> 
>   int write_batch_save = write_batch;
>   write_batch = 0;
>   if (am_server) {
>   if (!checksum_seed)
>   checksum_seed = time(NULL);
>   write_int(f_out, checksum_seed);
>   } else {
>   checksum_seed = read_int(f_in);
>   }
>   if (write_batch_save) {
>   write_int(f_out, checksum_seed);
>   write_batch = 1;
>   }

Yes, you're right.

> 
> Of course, if my delayed-start idea works out, you might be able to
> leave that code completely unchanged and just do the write_int() call
> later on in the code when batch writing actually gets turned on for the
> first time.

Hmm, good idea, I'll see.

> 
> In readfd():
> > +   if (write_batch && !am_sender) {
> 
> In writefd():
> > +   if (write_batch && am_sender) {
> 
> I'm thinking these checks might be safer if some init code did this:
> 
> int write_batch_monitor_in = -1;
> int write_batch_monitor_out = -1;
> 
>   if (write_batch) {
>   if (am_sender)
>   write_batch_monitor_out = f_out;
>   else
>   write_batch_monitor_in = f_in;
>   }
> 
> Then the code in readfd() could be "if (fd == write_batch_monitor_in)"
> and the code in writefd() could be "if (fd == write_batch_monitor_out)".
> Thus, the code could never record any I/O to the wrong fd.  For
> instance, a diff in the patches dir makes the receiver read data on
> a pipe from the generator, and we wouldn't want that going into the
> batch file (when the receiver was the one writing the batch file).
> 

Ah, yes.  That is a safer way.

> > +   if (read_batch) exit_cleanup(0);  /* no reason to continue */
> 
> I'm curious how easy it would be to avoid forking in the first place
> when reading a batch, but I didn't look to see how much more intrusive
> that

[PATCH] Batch-mode rewrite

2004-07-11 Thread Chris Shoemaker

Wayne,
Please consider the attached patch.  This applies to the current
CVS, and is independant of patches/local-batch.diff.  As a matter of
fact, I'm sure it would conflict heavily with local-batch.diff.

This version of batch mode has a couple distinguishing features:
Write-batch records (almost) the entire sender side of the conversation
into one file.  ("Almost" because it has to smooth out differences
between server-sender and non-server-sender.)  In theory, it should now
work with many of the other rsync options (like compression), even ones
that modify the protocol, as long as they act equally on server and
non-server.

The motivation of this patch is to significantly reduce the 
impact of batch-mode code on the rest of the codebase, and to allow
batch-mode to continue to "Just Work" even as the rest of the code
evolves.  Before, batch code was basically sprinkled in everywhere there
was significant i/o with the appropriate "if ({read|write}_batch)
batch_function_to_{read|write}_entire_datastructures()".  Now, batch
code is essentially de-coupled from the rest of rsync.  Points of
interaction are:
1) opening a batch file in main()
2) a hook in readfd() for writing batches from a receiver
3) a hook in writefd() for writing batches from a sender
4) file descriptor swaping in client_run() for reading batches
5) aborting the unneeded server in local_child() during a read_batch
6) a special-case for writing the protocol version during
an rsyncd inband exchange
7) a special-case for writing the checksum-seed
8) a special-case for not reading end-of-run statistics

Points 1-5 are very simple and clean.  Points 6-8 are all the
result of protocol dependance on server-ness.  Point 6 is unfortunate
but somewhat understandable.  Points 7 and 8 are, IMO, less understandable. 
In both cases, I think there are good reasons (unrelated to batch-mode) to
replace dependance on server-ness with dependance on sender-ness.  These
are not too hard to fix, but I went to great lengths to make this patch
not require any protocol changes.

I don't know if it's good style, but it was convenient for me to
include several large comments in this patch.  These comments describe
the issues related to these 3 special cases, and I would welcome feedback
via email on the comments.

The core batch functionality provided by 1-5 should be robust
against changes to the rest of rsync, _except_ for changes that
introduce new protocol dependance on server-ness.  This is because those
3 special cases ensure that the batch file is the same whether there was
a server involved or not.

There are still some issues with the patch as it is.  For
example: 1) I suspect one area in client_run() is non-portable.  2) This
leaves hundreds and hundreds of lines of dead-code around.  But I wanted
to get some feedback on progress so far.

Future work:
This patch still needs some clean-up, but it demonstrates a very
different design approach than the existing batch-mode and it works for
all the cases I've tested.
If you think this new incarnation of batch-mode will go
main-stream then I will write up the appropriate patch for the man page
as well.
If you are open to some protocol changes with the motivation of
unifying the client/server protocol with the sender/receiver protocol,
then I can write something up.
After batch-mode cleanup, I'd like to tackle some performance
issues.  But, I've also seen a few other functions that look like they
just need some clean-up.


Let me know what you think.

-chris
Index: batch.c
===
RCS file: /cvsroot/rsync/batch.c,v
retrieving revision 1.32
diff -c -b -d -r1.32 batch.c
*** batch.c 15 May 2004 19:31:10 -  1.32
--- batch.c 12 Jul 2004 00:37:45 -
***
*** 25,30 
--- 25,31 
  
  void write_batch_flist_info(int flist_count, struct file_struct **files)
  {
+   return;
char filename[MAXPATHLEN];
int i, f, save_pv;
int64 save_written;
***
*** 180,185 
--- 181,187 
   **/
  void write_batch_csum_info(int *flist_entry, struct sum_struct *s)
  {
+   return;
size_t i;
int int_count;
char filename[MAXPATHLEN];
***
*** 270,275 
--- 272,278 
  
  void write_batch_delta_file(char *buff, int bytes_to_write)
  {
+   return;
char filename[MAXPATHLEN];
  
if (f_delta < 0) {
Index: clientserver.c
===
RCS file: /cvsroot/rsync/clientserver.c,v
retrieving revision 1.127
diff -c -b -d -r1.127 clientserver.c
*** clientserver.c  13 Jun 2004 14:18:48 -  1.127
--- clientserver.c  12 Jul 2004 00:37:46 -
***
*** 50,55 
--- 50,57 
  exte

[PATCH] [TRIVIAL] whitespace + variable rename

2004-07-11 Thread Chris Shoemaker

The attached patch adds some whitespace to the recv_files() function
declaration, and renames variable 'f' to 'f_out' in generate_files().
Index: generator.c
===
RCS file: /cvsroot/rsync/generator.c,v
retrieving revision 1.93
diff -b -c -r1.93 generator.c
*** generator.c 30 Jun 2004 07:27:30 -  1.93
--- generator.c 11 Jul 2004 20:24:21 -
***
*** 543,549 
  }
  
  
! void generate_files(int f, struct file_list *flist, char *local_name)
  {
int i;
int phase = 0;
--- 545,551 
  }
  
  
! void generate_files(int f_out, struct file_list *flist, char *local_name)
  {
int i;
int phase = 0;
***
*** 584,590 
}
  
recv_generator(local_name ? local_name : f_name_to(file, fbuf),
!  file, i, f);
}
  
phase++;
--- 586,592 
}
  
recv_generator(local_name ? local_name : f_name_to(file, fbuf),
!  file, i, f_out);
}
  
phase++;
***
*** 594,614 
if (verbose > 2)
rprintf(FINFO,"generate_files phase=%d\n",phase);
  
!   write_int(f,-1);
  
/* files can cycle through the system more than once
 * to catch initial checksum errors */
while ((i = get_redo_num()) != -1) {
struct file_struct *file = flist->files[i];
recv_generator(local_name ? local_name : f_name_to(file, fbuf),
!  file, i, f);
}
  
phase++;
if (verbose > 2)
rprintf(FINFO,"generate_files phase=%d\n",phase);
  
!   write_int(f,-1);
  
if (preserve_hard_links)
do_hard_links();
--- 596,616 
if (verbose > 2)
rprintf(FINFO,"generate_files phase=%d\n",phase);
  
!   write_int(f_out,-1);
  
/* files can cycle through the system more than once
 * to catch initial checksum errors */
while ((i = get_redo_num()) != -1) {
struct file_struct *file = flist->files[i];
recv_generator(local_name ? local_name : f_name_to(file, fbuf),
!  file, i, f_out);
}
  
phase++;
if (verbose > 2)
rprintf(FINFO,"generate_files phase=%d\n",phase);
  
!   write_int(f_out,-1);
  
if (preserve_hard_links)
do_hard_links();
Index: receiver.c
===
RCS file: /cvsroot/rsync/receiver.c,v
retrieving revision 1.86
diff -b -c -r1.86 receiver.c
*** receiver.c  2 Jul 2004 18:23:57 -   1.86
--- receiver.c  11 Jul 2004 20:24:22 -
***
*** 289,295 
   * main routine for receiver process.
   *
   * Receiver process runs on the same host as the generator process. */
! int recv_files(int f_in,struct file_list *flist,char *local_name)
  {
int fd1,fd2;
STRUCT_STAT st;
--- 289,295 
   * main routine for receiver process.
   *
   * Receiver process runs on the same host as the generator process. */
! int recv_files(int f_in, struct file_list *flist, char *local_name)
  {
int fd1,fd2;
STRUCT_STAT st;
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: not updating changed local files

2004-07-07 Thread Chris Shoemaker

On Wed, Jul 07, 2004 at 06:01:25PM +0400, Ilya N. Golubev wrote:
> > Am I correct in guessing that 01:32 is _during_ the rsync run?
> 
> No.  The file in src dir was updated before rsync run, and became
> newer than one in dest dir.

Does rsync version 2.6.2 exhibit this same behavior?
 -chris

> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: not updating changed local files

2004-07-06 Thread Chris Shoemaker

On Tue, Jul 06, 2004 at 07:50:51PM +0400, Ilya N. Golubev wrote:
> rsync version 2.4.6  protocol version 24
> 
> [EMAIL PROTECTED]:~> rsync -avv ~/share/cvs-xemacs-head/XEmacs/xemacsweb 
> ~/share/public_html/xemacsweb
> building file list ... done
> 
> ...
> 
> xemacsweb/About/XEmacsServices.content is uptodate
> 
> ...
> 
> [EMAIL PROTECTED]:~> ls -l 
> {~/share/cvs-xemacs-head/XEmacs/xemacsweb,~/share/public_html/xemacsweb}/About/XEmacsServices.content
> -rw-r--r--1 gin  sdu 40379 Jul  6 01:32 
> /home/gin/share/cvs-xemacs-head/XEmacs/xemacsweb/About/XEmacsServices.content
^
Ilya,
Am I correct in guessing that 01:32 is _during_ the rsync run?
If so, you might benefit from a patch that was posted to the list about a month ago.  
I don't know if it made it into CVS or not.
Alternatively, you can simply repeat the execution of the command.

-chris



> -rw-r--r--1 gin  sdu 37841 Jun  2 19:02 
> /home/gin/share/public_html/xemacsweb/About/XEmacsServices.content
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: problems with --read-batch and --write-batch with --files-from

2004-06-22 Thread Chris Shoemaker

On Tue, Jun 22, 2004 at 07:20:40PM +0200, Karsten Scheibler wrote:
> > On Mon, Jun 21, 2004 at 05:22:19PM -0700, Wayne Davison wrote:
> > > On Mon, Jun 21, 2004 at 03:11:17PM -0400, Chris Shoemaker wrote:
> > > > Is this fixed by the "|| read_batch" [...] which is in CVS?
> > > 
> > > Yes.  That extra code makes --read-batch default to --no-whole-file,
> > > just like --write-batch.
> > 
> > There you go Karsten, Wayne's already fixed this in CVS.
> 
> ok thanks, i will try the next CVS snapshot in a few days.
> 
> But how about the second part of my posting: the problem with
> 'echo file1 | rsync --write-batch=data --files-from=- -a -v -v src/ dest/'
> writing batch files to different dirs. Ok i could specify an absolute prefix
> to --write-batch, but i think it should also work with relative paths.
> 
Although I haven't tested exactly this command line, I _think_ that this
is at least partially solved by the local_batch.diff patch in the CVS
patched dir.  I say "partially" because I suspect that the flist file
_may_ still end up in the sub-directory.  In any case, I'm working
toward a re-writing of batch-mode, so stay tuned.
   -chris

> 
> 
> karsten
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: problems with --read-batch and --write-batch with --files-from

2004-06-21 Thread Chris Shoemaker

On Mon, Jun 21, 2004 at 05:22:19PM -0700, Wayne Davison wrote:
> On Mon, Jun 21, 2004 at 03:11:17PM -0400, Chris Shoemaker wrote:
> > Is this fixed by the "|| read_batch" [...] which is in CVS?
> 
> Yes.  That extra code makes --read-batch default to --no-whole-file,
> just like --write-batch.

There you go Karsten, Wayne's already fixed this in CVS.

Too bad CVS doesn't work too well for me (again)  :-(

[EMAIL PROTECTED] rsync]$ cvs update -d
cvs [update aborted]: unrecognized auth response from pserver.samba.org:
cvs pserver: /cvsroot/CVSROOT/config: unrecognized keyword
'UseNewInfoFmtStrings'

Anyone else having problems?
  -chris

> 
> ..wayne..
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: problems with --read-batch and --write-batch with --files-from

2004-06-21 Thread Chris Shoemaker

On Mon, Jun 21, 2004 at 03:48:16PM -0700, Wayne Davison wrote:
> On Mon, Jun 21, 2004 at 09:57:13PM +0200, Karsten Scheibler wrote:
> > delta-transmission disabled for local transfer or --whole-file
> 
> That's because of a long-standing bug in batch mode.  Specify
> --no-whole-file to work around it when reading a batch.
> 
> ..wayne..

Wayne,
Is this fixed by the "|| read_batch" in:

static BOOL disable_deltas_p(void)
{
if (whole_file > 0)
return True;
if (whole_file == 0 || write_batch || read_batch)
return False;
return  local_server;
}

which is in CVS?
-chris


> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: problems with --read-batch and --write-batch with --files-from

2004-06-21 Thread Chris Shoemaker

On Mon, Jun 21, 2004 at 09:57:13PM +0200, Karsten Scheibler wrote:
> Hello,
> 
> I want to use the --read/write-batch options, but i have problems to do so.
> The shell script [1] leads to the following error:
> 
> [sender] expand file_list to 131072 bytes, did move
> delta-transmission disabled for local transfer or --whole-file
> file1
> rsync: writefd_unbuffered failed to write 64 bytes: phase "unknown":
> Daten?bergabe unterbrochen (broken pipe)
> rsync error: error in rsync protocol data stream (code 12) at io.c(836)

I've verified this behavior in ver 2.6.2.  I have an idea what it might
be.  Let me take a look.
-chris

> 
> --[1]--
> #!/bin/bash
> 
> DIR="testdir-$(date '+%Y%m%d%H%M%S')"
> mkdir "$DIR" &&
> cd "$DIR" &&
> mkdir src dest &&
> (cd src && dd if=/dev/zero of=file1 bs=1024 count=1024) &&
> (cd dest && dd if=/dev/zero of=file1 bs=1024 count=512) &&
> tar -c -z -f data.tar.gz src dest &&
> rsync --write-batch=data -a -v src/ dest/ &&
> find . &&
> rm -rf src dest &&
> tar -x -z -f data.tar.gz &&
> echo --- &&
> rsync --read-batch=data -a -v -v dest/
> ---
> 
> Additionally if i use the --files-from option with --write-batch and the given
> prefix is relative 3 of 4 files will be written to the src/ dir. The find in
> the shell script [2] gives the following:
> 
> .
> ./src
> ./src/file1
> ./src/data.rsync_flist
> ./src/data.rsync_csums
> ./src/data.rsync_delta
> ./dest
> ./dest/file1
> ./data.tar.gz
> ./data.rsync_argvs
> 
> --[2]--
> #!/bin/bash
> 
> DIR="testdir-$(date '+%Y%m%d%H%M%S')"
> mkdir "$DIR" &&
> cd "$DIR" &&
> mkdir src dest &&
> (cd src && dd if=/dev/zero of=file1 bs=1024 count=1024) &&
> (cd dest && dd if=/dev/zero of=file1 bs=1024 count=512) &&
> tar -c -z -f data.tar.gz src dest &&
> echo file1 | rsync --write-batch=data --files-from=- -a -v -v src/ dest/ &&
> find . &&
> rm -rf src dest &&
> tar -x -z -f data.tar.gz &&
> echo --- &&
> rsync --read-batch=data -a -v -v dest/
> ---
> 
> 
> Thanks,
> 
> karsten
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [PATCH] make write_batch local

2004-06-19 Thread Chris Shoemaker

On Fri, Jun 18, 2004 at 10:20:57AM -0700, Wayne Davison wrote:
> On Wed, Jun 16, 2004 at 07:09:46PM -0400, Chris Shoemaker wrote:
> > I hope you have the time to review this patch and comment.
> 
> The patch looks good on first inspection.  I don't like the change to
> the whole-file default, though -- I'd prefer rsync to not force the
> --whole-file option in batch mode (since we want batch files to be
> efficient by default).

Ah, I see your point.  I think I looked at that disable_delta_p() function
50 times, and had to re-parse it every single time.  And even though the
comments are correct, they didn't help.  I think my brain can only keep
track of like 3 branches or negations at a time, after that, they fall
off the stack.  

Off the top of my head, I can't see a clear way to simplify this code.
The complexity comes from wanting to change the default behavior based
on batch-mode (which is evident from the option args) and pure-locality
(which is only evident after src/dest have been parsed out), while still
respecting overriding --{no-}whole-file options.  Any ideas?

> 
> Also, this comment in pipe.c looks like you meant to say "server" rather
> than "sender" in the second instance, correct?

Hmm, I thought I meant sender, but maybe you can explain why you think
server makes more sense.  My thinking was that the child forked in this
function becomes the "sender".  I know that this child will call
start_server(), but I think it will then always call do_server_sender().
In my mind, "sender" is more specific than "server", since (in general,
even if not in this case) servers can be receivers.  Perhaps
"sender-server" is clearest?

The over-arching concern should probably be consistency in the use of
terms in the comments in the entire source tree, and I'm not sure my
usage of these terms agrees with that, as I've only skimmed most
functions.  What do you think?

> 
> > am_server = 1;
> >   
> > +   /* There is write_batch code on both the receiver and
> > +* sender sides.  In local_child, both are local processes,
> > +* so we must make sure that only one actually writes.  It 
> > +* shouldn't matter which one -- here we prevent sender from
> > +* writing. */
> > +   write_batch = 0; 
> > +   
> > if (!am_sender) 
> > filesfrom_fd = -1;
> 
> In your comment on your testing between 2.5.7 and your patched version, you
> said:
> 
> > All four methods produce identical batch files.
> 
> I assume you mean excepting the file-list file, which changed radically
> in 2.6.1 (the old code used to write out its own file-list data, but the
> new code uses the actual file-list- sending code to output this data,
> making it much easier to maintain).

Yes, I finished the flist stuff first, because it was the easiest.  I
wasn't testing as rigorously at that time.  It was the difficulties of
the deltas and csums that actually required me to look at the 2.5.7 case
for comparison and referrance.  BTW, 2.6.1 produces flist files that are
noticably smaller than 2.5.7.

> 
> > Regarding next steps, I've come to believe that the entire batch mode
> > code is too invasive into too many parts of the rest of the code.
> 
> Yes, I think I'm inclined to agree with you.
> 
> > If you're willing to break backward compatibility of existing batch
> > files,
> 
> I am.  The code is marked as experimental, and I don't see a great need
> for people to keep batch files for long periods of time, so this
> shouldn't cause people a problem.
> 
> > I think the batch mode code can be significantly simplified and made
> > more maintainable and flexible.  Basically, I think batch mode should
> > just record whatever hits the socket.  I don't see much benefit to
> > splitting up the data into 3 files, essentially creating a batch-mode
> > specific protocol on the disk, with special reading and writing
> > functions.  Simplify, simplify.
> 
> That would be much better.  In fact, that could even be made a separate
> helper app since it would be possible to have an app record the data and
> forward it.  Then, the --rsh and/or --rsync-path option could be used to
> have rsync talk to a batch-replay program instead of a real rsync.  The
> one complicating factor I can see is that there are some differences
> between a daemon stream and a normal remote-shell stream, but it should
> be possible do a different batch-replay command depending on what stream
> was recorded (e.g. daemon syntax using a custom program via the --rsh
> option).  I haven't fully th

Re: wildcard error in source path?

2004-06-18 Thread Chris Shoemaker

On Fri, Jun 18, 2004 at 09:50:22AM +0100, Stuart Halliday wrote:
> > Logically, this is correct behaviour, I think.
> > 
> > dump/* is a wildcard that matches every _existing_ local file in the 
> > dump/ directory. Since the file you deleted doesn't exist, it isn't 
> > considered by rsync.
> > 
> > dump/  tells rsync to compare the contents of the local dump/ directory 
> > with those of the remote one and, in your case, will delete on the 
> > remote host any files that don't exist locally.
> > 
> > Disclaimer: I haven't used --delete myself, so I could be wrong.
> 
> 
> Yes your statement sounds logic. But the use of --delete is stated as being:
> 
> "delete files that don't exist on the sending side"
> 
> Which I assume means those on the remote side. 
> 
> Since it isn't deleting the file on the remote side then rsync is a touch broken.

No, Terry is correct.  --delete will only delete files that are missing
_among the files you specify_  If you don't specify a file, it is not
considered at all, for any purpose.
  -chris

> 
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Problem in using rsync

2004-06-17 Thread Chris Shoemaker

On Thu, Jun 17, 2004 at 10:11:19AM -0400, Anh Truong wrote:
> Hi
> 
> I use rsync to perform backup on disk on a SunFire 880 with Solaris 8. For 
> performance issues, we launch simultaneously 5 rsyncs on 5 different fliesystems 
> and about 150-200 "cp -p" commands on as many database files. We have been 
> using the same scripts for about 2 months, without problems. The backup is 
> performed on the same server (from filesystem to filesystem on the same server).


Try the rsyncs serially.  If you have no problems then I'd suspect memory
pressure is pushing things to swap and causing the timeout.
  -chris
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

[PATCH] make write_batch local

2004-06-16 Thread Chris Shoemaker

Wayne,
It's taken a little while for me to get more familiar with the
code, but I think I've reached a good breakpoint in improving
batch-mode.  Let me highlight some of the changes in the
attached patch:

* --write-batch and --read-batch arguments are no longer passed
from client to server.  This fixes the current problem
that causes the server threads to die when the client
used --write-batch and the server attempts to create the
batch file on the server.  This means that new clients
will actually be able to use batch-mode, even with old
servers.

* --write-batch and --read-batch arguments are ignored when
paired with --server.  This means that new servers will
begin to work correctly with older clients that still
send batch arguments to the server.

* The biggest change is that there are now write-batch code
paths for both sender and receiver.  (It was only for
sender before.)  This means that a client can now write
a local batch file, whether that client is a sender or a
receiver.  The intent is that both sides would produce
identical batch files.  I tested this four ways.  1)
writing a batch file during a local-local sync with an
old (2.5.7) version, which wrote from the sender side.
2) writing a batch file during a local-local sync with
my patched version, which (arbitrarily) writes from the
receiver side.  3) writing a batch file during a
remote-to-local sync, with my patched version, which,
obviously, writes from the receiver.  4) writing a batch
file during a local-to-remote sync, with my patched
version, which, obviously, writes from the sender.  All
four methods produce identical batch files.

* delta mode (aka --no-whole-file) is now independent of
batch-mode.  There was some subtle forcing of delta mode
happening in disable_delta_p(), which I never could
understand.  Now, delta mode and batch mode are
compatible in any combination, and if you want delta
mode to differ from whatever the default is, you must
explicitly set it so.

* minor stuff like a few comments, factor one line "argc--"
outside of if-then-else, two spelling fixes.


I hope you have the time to review this patch and comment.  Regarding
next steps, I've come to believe that the entire batch mode code is too
invasive into too many parts of the rest of the code.  If you're willing
to break backward compatibility of existing batch files, I think the
batch mode code can be significantly simplified and made more
maintainable and flexible.  Basically, I think batch mode should just 
record whatever hits the socket.  I don't see much benefit to splitting
up the data into 3 files, essentially creating a batch-mode specific
protocol on the disk, with special reading and writing functions.
Simplify, simplify.

What do you think?

 -chris

? patches/make-write-batch-local.diff
? patches/patch_fix_read_batch_bug
Index: batch.c
===
RCS file: /cvsroot/rsync/batch.c,v
retrieving revision 1.32
diff -c -b -d -r1.32 batch.c
*** a/batch.c   15 May 2004 19:31:10 -  1.32
--- b/batch.c   17 Jun 2004 04:01:54 -
***
*** 172,177 
--- 172,178 
  
  /**
   * Write csum info to batch file
+  * If flist_entry < 0, just open the file
   *
   * @todo This will break if s->count is ever larger than maxint.  The
   * batch code should probably be changed to consistently use the
***
*** 198,203 
--- 199,206 
}
}
  
+   if (*flist_entry < 0)
+   return;
write_batch_csums_file(flist_entry, sizeof (int));
int_count = s ? (int) s->count : 0;
write_batch_csums_file(&int_count, sizeof int_count);
***
*** 285,290 
--- 288,296 
}
}
  
+   if (buff == NULL)
+   return;
+ 
if (write(f_delta, buff, bytes_to_write) < 0) {
rsyserr(FERROR, errno, "Batch file %s write error", filename);
close(f_delta);
Index: compat.c
===
RCS file: /cvsroot/rsync/compat.c,v
retrieving revision 1.22
diff -c -b -d -r1.22 compat.c
Index: flist.c
===
RCS file: /cvsroot/rsync/flist.c,v
retrieving revision 1.230
diff -c -b -d -r1.230 flist.c
*** a/flist.c   11 Jun 2004 07:40:57 -  1.230
--- b/flist.c   17 Jun 2004 04:01:55 -

Re: rsycnc copies all files

2004-06-16 Thread Chris Shoemaker

On Wed, Jun 16, 2004 at 10:16:19PM +0100, Gareth wrote:
> Wayne Davison wrote:
> 
> >On Wed, Jun 16, 2004 at 08:26:33PM +0100, Gareth wrote:
> > 
> >
> >>I am making an extraordinary claim: rysnc seems to copy all my files, 
> >>not just ones that have changed or new files.
> >>   
> >>
> >
> >Use either -t (preferred) or -c (slower).  See also -a.
> >
> >..wayne..
> > 
> >
> Wayne... Thanks for the reply.
> 
> The man pages say -c passes "always checksum" - I thought the whole 
> point of rsync was to checksum the differences between source and 
> destination files. The many examples / tutorials I've found from 
> Googling do not show the use of the -c for rysnc to be effective. (For 
> example 
> http://www.devshed.com/index2.php?option=content&task=view&id=56&pop=1&page=0&hide_js=1
>  
> )
> 
> Is the -a option typically used by most users of rsync?
> 

If you're going to rsync more than once, yes. (or else the explicit
--preserve-foos)
-chris

> Many thanks
> 
> 
> Gareth
> -- 
> To unsubscribe or change options: 
> http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: [Bug 1463] New: poor performance with large block size

2004-06-16 Thread Chris Shoemaker

On Wed, Jun 16, 2004 at 06:21:15AM -0700, [EMAIL PROTECTED] wrote:
> https://bugzilla.samba.org/show_bug.cgi?id=1463
> 
>Summary: poor performance with large block size
>Product: rsync
>Version: 2.6.2
>   Platform: x86
> OS/Version: other
> Status: NEW
>   Severity: normal
>   Priority: P3
>  Component: core
> AssignedTo: [EMAIL PROTECTED]
> ReportedBy: [EMAIL PROTECTED]
>  QAContact: [EMAIL PROTECTED]
> 
> 
> I have a 29Gig file that is the previous version of a file and a 1.3 Gig 
> incremental backup of the file. I did the transfer with no block size option 
> and it takes about 6 minutes (GigEthernet). I used --block-size = 90k and it 
> took about 6 minutes. I used --block-size=182000 (close to the square root of 
> 29 Gig) and it only completed ~50 Meg of transfer of the 1.3 Gig in a couple of 
> hours.
> 
> Chris Shoemaker suggests this is a problem with the sliding window being a 
> fixed size of 256k and that the block size will not allow multiple blocks in 
> the window size. 

Er, that suggestion sounds WAY too intelligent to have come from me.  :-)
Seriously, that was Craig Barratt.
-chris

> 
> Operating system is Redhat 9.1 for both systems. Both systems have the 2.6.2 
> with only 1 patch and it is the one Wayne provided for --bwlimit being bi-modal
> 
> -- 
> Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
> --- You are receiving this mail because: ---
> You are the QA contact for the bug, or are watching the QA contact.
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Suggested chnage to "--partial" usage.

2004-06-16 Thread Chris Shoemaker

On Wed, Jun 16, 2004 at 08:30:16PM +0800, Jason Potter wrote:
> Hi There,
> 
>  
> 
> This post is brought about due to the following two:
> 
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg10702.html
> 
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg10709.html
> 
>  
> 
> I have a situation where I need to upload large files over an unstable link
> (resuming is a requirement) and only when they are complete can they be
> renamed and hence replace the original file.  The process is an automated
> one so I don't have the ability to just walk over to the machine and see if
> it is finished by manually comparing the file size at the source and
> destination.
> 
>  
> 
> I have looked at the source code and have an alternative suggestion to the
> way -partial works.  I am looking for comment on if people think this is a
> valid suggestion or am I missing something.
> 
>  
> 
> 1)   In cleanup.c the function _exit_cleanup will call finish_transfer
> regardless of the code the function receives as a parameter. (see errcode.h
> for the codes it could receive)
> 
> 2)   Suggest a new function is written to adjust the name of the file
> the temp file is written to, to be the correct name plus a know extension.
> This achieves the requirement that the original file does not get over
> written if the file is not complete.
> 
> 3)   If the above was implemented you would then just have to adjust the
> scyncing code to check for these temporary files and use these to resume the
> connections.
> 
> 4)   When the file is complete _exit_cleanup gets a code value of 0
> (zero) and this can be used to call the currently written finish_transfer.
> 
>  
> 
> So what do you all think, will it work. 
> 

I can see the usefulness of such a feature, but ...

what if the portion of the source or destination file that was
already transferred (and which was stored in the temporary file) is modified
between subsequent attempts?  (I don't mean the temp file changes, I mean
the actual src or dest.)  ISTM, you'd have to at least _check_ for this
condition.  That means rechecksumming the source file on the sender from the
beginning.  On the receiver side, it means checksumming both the
(possibly modified) destination file and the previously saved temp file.
Then you have 3-way compare:
   sumA = checksum from source file
   sumB = checksum from possibly modified dest
   sumT = checksum from previously saved temp

   if sumA == sumB, do nothing.
   if sumA != sumB && sumA == sumT,
  then no retransmit needed, use block from tempfile
   if sumA != sumB && sumA != sumT, 
  then retransmit anyway, throw away temp block.

Interesting.  I haven't delved into the core rsync algorithm enough to
say for sure that this is possible, but I don't know that it's
_im_possible.  :-)

-chris

>  
> 
> I look forward to your responses.
> 
>  
> 
> Cheers
> 
> Jason
> 
>  
> 
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Rsync security

2004-06-16 Thread Chris Shoemaker

On Wed, Jun 16, 2004 at 02:37:25PM -0700, Wayne Davison wrote:
> On Wed, Jun 16, 2004 at 12:30:04PM -0400, Chris Shoemaker wrote:
> > Do any "rsync developers" care to confirm/deny?  [...] I've used rsync
> > over NFS with no problems.  
> 
> It has been said many times before that using network-mounted disks is
> suboptimal because rsync is optimizing the data transfer, not the file
> reading -- i.e. if both ends of the transfer are running on the same
> machine, the socket/pipe data that rsync is optimizing is being sent
> locally and the disk-reading is happening via network reads, so rsync
> can't optimize the network use.  However, if you don't mind the

good point, but... rsync can still experience all the benefits of skipping
files that are identical, even the file reads, assuming you're
preserving modification times.  Thus, for me, rsyncing several TB
filesystems over NFS where src and dest are 99% identical still takes
only minutes.
-chris

> slowdown, it should work OK (as long as the network filesystem involved
> isn't buggy or have compatibility issues).
> 
> ..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Need information about "--stats" output.

2004-06-16 Thread Chris Shoemaker

Wayne,
That's great stuff.  What do you think about a cut-n-paste of this example and
explanation into the documentation somewhere?

-Chris

On Wed, Jun 16, 2004 at 02:02:17PM -0700, Wayne Davison wrote:
> On Wed, Jun 16, 2004 at 11:16:30AM -0400, Collins, Kevin wrote:
> > Number of files: 161530
> > Number of files transferred: 327
> > Total file size: 97936829135 bytes
> > Total transferred file size: 945709165 bytes
> > Literal data: 741315984 bytes
> > Matched data: 204393181 bytes
> > File list size: 3446549
> > Total bytes written: 745547229
> > Total bytes read: 1090478
> > 
> > wrote 745547229 bytes  read 1090478 bytes  87035.93 bytes/sec total size is
> > 97936829135  speedup is 131.17
> 
> > The problem is:  I don't know exactly how much it transferred offsite.
> 
> The amount of "Literal data" (741,315,984 bytes) is the file data that
> we had to send over the socket to the receiver.  The "Matched data"
> (204,393,181 bytes) is the remainder of the file data that was found to
> already exist in the files we updated.  Both of those figures add up to
> the "Total transferred size" (945,709,165 bytes), which is the total
> amount of file data in the 327 files that got updated by the transfer.
> 
> There were 161,530 total files found, and all those files (which
> includes the ones that got updated) totaled 97,936,829,135 bytes.
> 
> The "Total bytes written" (745,547,229 bytes) includes the protocol
> overhead, so if you subtract off the "Literal data" from that you'll
> discover that there was an overhead of 4,231,245 bytes sent in the
> same direction as the transfer (the overhead in the opposite direction
> was 1,090,478 bytes).  Keep in mind that if you were pulling the files
> instead of pushing them that the "Total bytes read" value would be the
> one that contained the "Literal data" value.
> 
> ..wayne..
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Rsync security

2004-06-16 Thread Chris Shoemaker

On Wed, Jun 16, 2004 at 04:34:46PM +0100, Andrew Smith-MAGAZINES wrote:
> My personal preference was to mount a share from the file server on the client and 
> essentially do the sync all locally on the client but rsync doesn't seem to like 
> doing this very much (apparently this is advised against),
> 
> What doesn't rsync like?  Do you mean something like a rsync between a
> local mount and a locally-mounted NFS export?  That should be fine.
>   -Chris
> 
> >>
> Hi Chris, yes this is exactly what I mean. When I tested this it threw up lots of 
> errors.
> A collegue of mine asked the question of MIT who developed the rsyncx version and 
> they said:
> 
> #This problem has been noted.  The rsync developers recommend only local 
> #mounts for rsyncs, or using the rsync network transports (or ssh) for 
> #remote rsyncs.  As of v1.7d, ssh is the fastest form of rsync to perform.
> 

Hmm, I'm completely unaware of that recommendation.  Do any "rsync
developers" care to confirm/deny?  Perhaps it's an obsolete
recommendation.  I've used rsync over NFS with no problems.  

> 
> If you think it should work ok, then I'll test it again,

Please report success or failure, esp. failure.
-Chris

> 
> thanks for your comments, Andy.
> <<
> --
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

[OT] CVS update

2004-06-15 Thread Chris Shoemaker


Um, 

I don't remember the exact checkout command line I used, but it was
probably something like the instructions on the web page:

cvs -d :pserver:[EMAIL PROTECTED]:/cvsroot rsync

anyway,  how do I update?
I tried:

[EMAIL PROTECTED] rsync]$ cvs update -d -P
cvs [update aborted]: connect to pserver.samba.org(66.70.73.150):2401
failed: Connection refused


and login gives the same:

[EMAIL PROTECTED] rsync]$  cvs -d :pserver:[EMAIL PROTECTED]:/cvsroot login
Logging in to :pserver:[EMAIL PROTECTED]:2401/cvsroot
CVS password:
cvs [login aborted]: connect to pserver.samba.org(66.70.73.150):2401
failed: Connection refused


What am I doing wrong?

-Chris
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Rsync security

2004-06-15 Thread Chris Shoemaker

On Tue, Jun 15, 2004 at 03:37:21PM +0100, ww m-pubsyssamba wrote:
> Hello list,
> 
>   I have a requirement to script a sync from a server to a UNIX workstation (Mac 
> OS X) users desktop and profile related data at logon and
> logoff. Rsync looks like it may be appropriate, but I am concerned about making a 
> sufficiently secure connection between the server and the
> client (given my sync must be non-interactive).
> Rsh is not an option, so Ssh seems to be the only alternative. Now I'm quite 
> familiar with Ssh, setting up public/private key pairs etc. but I'm
> quite uncomfortable about using this across hundreds of workstations to provide the 
> sync functionality I'm looking for. Specifically my fear is if
> someone gains administrative access to their workstation and can access the ssh 
> private key & ssh server key they will be able to access any
> data they want from the central file server. Plus relying on keypairs is very messy 
> from an administrative point of view.
> I guess other people must have thought about a similar type of requirement in terms 
> of security and was hoping I might get some pointers from
> those how have done this before. My personal preference was to mount a share from 
> the file server on the client and essentially do the sync all locally on the client 
> but rsync doesn't seem to like doing this very much (apparently this is advised 
> against),

What doesn't rsync like?  Do you mean something like a rsync between a
local mount and a locally-mounted NFS export?  That should be fine.
-Chris

> 
>   any help gratefully recieved, thanks Andy.
> --
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: how to exclude large files from backup list.

2004-06-15 Thread Chris Shoemaker

On Tue, Jun 15, 2004 at 12:05:09AM -0700, Wayne Davison wrote:
> On Tue, Jun 15, 2004 at 10:34:18AM +0800, Jiang Wensheng wrote:
> > I am using a computer to back up files from another computer
> > automatically. I want to exclude large files from backing up. How can
> > I do that?
> 
> Either create an exclude list manually before running rsync or create a
> transfer list and use --files-from (the "find" utility has options that
> would help you).  Adding an option to make rsync do this wouldn't be
> very hard, but no one has written such an option (that I know of).
> 
  IMHO, that's _because_ "find" already does this, and so it should be.
  I think rsync may already be in danger of becoming that 101-function
  Swiss army knife that's just got so many features that it's unwieldy
  and unused.

  OTOH, I did find the --compare-dest=DIR option useful today, and
  that appears spurious at casual glance.
  
-Chris

> ..wayne..
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: what am I doing wrong

2004-06-11 Thread Chris Shoemaker

On Fri, Jun 11, 2004 at 02:53:53PM -0400, Wallace Matthews wrote:
> I am seeing some rather strange behavior with synch of 2 directories on the same 
> system using 2.6.2.
> 
> The older file is the image of a full backup and is 29Gig in size. The new image is 
> a slice of an incremental
> backup and is 101Meg in size.
> 
> the command line is:
> time /home/wally/rsync/rsync-2.6.2 -av --rsh=rsh --backup --stats --block-size= 
> --write-batch=kbup1aaa /test/Kibbutz/Kbup_1.aaa /test/Kibbutz/work 
> 
> What I am observing in /test/Kibbutz/work is a file .Kbup_1.aaa.AZVyuT that is 35 
> Meg in size after an overnight run that has been going on for 14 hours. When I kill 
> the job, I get real 817m10.062s and user 814m45.940s sys 7m23.870s. 
> 
> I have tried this without the --block-size statement and it goes pretty fast but the 
> literal data is 104M with no matches.
> 
> I have tried it for a variety of --block-size= and it always stalls with very 
> high user times.
> 
> If I make the destination fedor://test/Kibbutz with a copy of the 29G file in the 
> destination directory, it takes about 30m of real time and 9m of user time. 
> 
> It seems to be specific to source and destination being on the same system. 
> 
> Would either Wayne or Tim give me some insight into what I am doing to screw up 
> rsync so badly??

Do you observe the same behavior without "write-batch"?
-chris

> 
> I did similar experiments with 2.5.7 in January and didnt see behavior like this, 
> but at that time my full backup images were only 100 Meg or so and my incremental 
> backups were about 10 Meg. 
> 
> I was experimenting with building the deltas locally and distributing them with a 
> download server for expansion of the remote targets.
> 
> wally
> 
> --
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Keeping Multiple Rsyncs Separate

2004-05-28 Thread Chris Shoemaker

On Thu, May 27, 2004 at 08:50:11PM -0700, Swarbrick Software wrote:
> I have noticed that if you run two rsyncs at once, they get confused and
> copy the files from one the wrong rsync thread. Apparently this is because
> of the ?Build List? that is made in ram. Two build lists stepping on each
> other. Does anyone know how to change the source so that the each build list
> in ram is kept separate?

Can you give the example command lines?  It is very unlikely to be any
confusion in RAM, but perhaps a complete bug report would shed some
light...

-chris
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: rsync hangs in cron (not SSH-problem)

2004-05-27 Thread Chris Shoemaker

On Fri, May 21, 2004 at 02:55:37AM +0200, Pascal Nobus wrote:
> This is the case
> 
> - mounted Inetpub's windows-webserver on /mnt/web1 /mnt/web2, etc.
> - rsync this to local dir:
>   rsync -av --delete /mnt/web1 /mass/kuurne/day
>   rsync -av --delete /mnt/web2 /mass/kuurne/day
>   etc..
> 
> - when logged in, everything works (I do see some errors about 
> non-existing files, but rsync won't stop.
> 
> 
> When used this command in cron
> 
> 00 01 * * * rsync -av --delete /mnt/web1 /mass/kuurne/day
> 00 02 * * * rsync -av --delete /mnt/web2 /mass/kuurne/day
> etc..
> 
> 
> Rsync hangs, it doesn't finish!
> 
> root  2036  5.7 10.4 27616 26704 ?   S01:00   5:19 rsync -av 
> --delete /mnt/web1 /mass/kuurne/day
> root  2037  3.4 11.0 29028 28104 ?   S01:00   3:09 rsync -av 
> --delete /mnt/web1 /mass/kuurne/day
> root  2048  3.1 11.0 29060 28132 ?   S01:11   2:36 rsync -av 
> --delete /mnt/web1 /mass/kuurne/day
> root  2062 10.0  7.9 21304 20168 ?   S02:00   3:19 rsync -av 
> --delete /mnt/web4 /mass/kuurne/day
> root  2064  4.9  8.2 22208 21056 ?   S02:00   1:37 rsync -av 
> --delete /mnt/web4 /mass/kuurne/day
> root  2094  5.7  8.2 22252 21096 ?   S02:05   1:34 rsync -av 
> --delete /mnt/web4 /mass/kuurne/day

It looks like you have 3 of each running concurrently.

> 
> 
> The dir's to be copieed are big (about 5-10 GB), but normally run it 
> finishes after 10-20 min.
> 
> And...
> 
> Some little dir's (less then 1 GB) don't give a problem.
> However, got space enough, load = 0.00 and memory isn't full
> Mem:   255152K av,  252816K used,2336K free,   0K shrd,   28832K 
> buff
> Swap:  530104K av,   10912K used,  519192K free   38812K 
> cached
> 
> 
> 
> Ideas??
> 
  Uh, that memory's pretty full.  Once you start paging to disk you may
never finish rsync.  Does the system become unresponsive?  Make sure
there are no other cron jobs that might add memory pressure timed to run
soon before (or concurrent to) the rsync.
  You can also 'ls -al /proc/{pid-of-rsync}/fd' to see what files are 
open.
  Also, just to ensure that rsyncs don't run at the same time, consider 
making just one cron job which is a script containing each rsync command 
ending with semicolon (;), no backgrounding.  They will always run 
serially, then.

  -chris
> 
> 
> 
> -- 
> To unsubscribe or change options: 
> http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

? about FLAG_TOP_DIR

2004-05-18 Thread Chris Shoemaker


in send_file_name(), there is:

  if (write_batch) 
file->flags |= FLAG_TOP_DIR;

Can anyone explain this?  It results in the file flags sent to the
batch file differing from the ones sent to the recevier by that one
bit.  But, why?

-chris

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

2004-05-18 Thread Chris Shoemaker

On Tue, May 18, 2004 at 11:11:51AM -0400, Alberto Accomazzi wrote:
> 
> Wayne Davison wrote:
> >



> >I'm wondering if batch mode should be removed from the main rsync
> >release and relegated to a parallel project?  It seems to me that a
> >better feature for the mainstream utility would be something that
> >optimized away some of the load on the sending system when it is
> >serving lots of users.  So, having the ability to cache a directory
> >tree's information, and the ability to cache checksums for files
> >would be useful (especially if the data was auto-updated as it
> >became stale).  That would make all transfers more optimal,
> >regardless of what files the receiving system started from.
> 
> Firs of all, I have a feeling that the number of people who have 
> *considered* using batch mode is quite small, and those who actually 
> have used in the recent past is certainly an even smaller number (I'm 
> thinking zero, actually).  So removing the functionality from the 

/me hold waves his hand frantically.
One, here.  :-)

> mainstream rsync would not be a problem, in fact I think it would be a 
> good thing.  It doesn't make sense to keep something in the code that is 
> not used and cannot be reliably supported.  Although I applaud Jos's 
> efforts in providing this functionality to rsync, I was surprised to see 

Jos did that?  Good job!

> it included in the main distribution, especially since it underwent 
> virtually no testing as far as I can tell.
> 
> There's no doubt that caching the file list on the server side would 
> indeed be a very useful feature for all those who use rsyncd as a 
> distribution method.  We all know how difficult it can be to reliably 
> rsync a large directory tree because of the memory and I/O costs in 
> keeping a huge filelist in memory.  This may best be done by creating a 
> separate helper application (say rsyncd-cache or such) that can be run 
> on a regular basis to create a cached version of a directory tree 
> corresponding to an rsyncd "module" on the server side.  The trick in 
> getting this right will be to separate out the client-supplied options 
> concering file selection, checksumming, etc, so that the cache is as 
> general as possible and can be used for a large set of connections so as 
> to minimize the number of times that the actual filesystem is scanned.

What client options are you thinking will be tricky?  Wouldn't the 
helper app just cache _all_ the metadata for the module, and then rsync would 
query only the subset it needed?  It's not like the client can change the 
checksum stride.  [That would hurt.]

-chris

> 
> >Such a new feature would probably best be added to an rsync
> >replacement project, though.
> 
> Hmmm... "replacement"?  why not make this a utility that can be run 
> alongsize an rsync daemon?  Or are you thinking of a design for a "new" 
> rsync?
> 
> 
> -- Alberto
> 
> 
> 
> Alberto Accomazzi  aaccomazzi(at)cfa harvard edu
> NASA Astrophysics Data Systemads.harvard.edu
> Harvard-Smithsonian Center for Astrophysics  www.cfa.harvard.edu
> 60 Garden St, MS 31, Cambridge, MA 02138, USA
> 
> 
> -- 
> To unsubscribe or change options: 
> http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

2004-05-18 Thread Chris Shoemaker

On Tue, May 18, 2004 at 10:06:52AM -0400, Alberto Accomazzi wrote:
> Chris Shoemaker wrote:
> 
> > Indeed, what you describe seems to have been the design motivation.  
> > I
> >can share what my desired application is: I want to create a mirror of a
> >public server onto my local machine which physically disconnected from the
> >Internet, and keep it current.  So, I intend to first rsync update my own 
> >copy
> >which _is_ networked while creating the batch set.  Then I can sneakernet 
> >the
> >batch set to the unnetworked machine and use rsync --read-batch to update 
> >it. This keeps the batch sets smallish even though the mirror is largish. 
> 
> This was something I looked into a couple of years ago.  Back then I 
> even posted an email to the list 
> (http://lists.samba.org/archive/rsync/2002-August/003433.html) and got 
> no feedback, which led me to conclude that people were not doing any of 

Reading that post was like reading something I could have written just 
last week.  :)  I'm sorry you didn't get any response.  Things must have 
picked up around here.  You, Wayne and Jos have been quite responsive to my 
recent questions and comments.

> this at the time.  To restate the obvious, the batch mode thing is 
> really just a glorified diff/patch operation.  The problem I have with 
> it is that AFAICT it's a very fragile one, since a simple change of one 
> file on either sender or receiver after the batch has been created will 
> invalidate the use of the batch mode.  Contrast this with diff/patch, 
> which has builtin measures to account for fuzzy matches and therefore 
> makes it a much more robust tool.

You're right about the fragility, but under some conditions the 
constraints can be met.

> 
> In the end my motivation for using the rsync-via-sneakernet approach 
> disappeared when I convinced myself that the whole operation would have 
> been far too unreliable, at least for our application where files are 
> updated all the time and there is never really a "freeze" of a release 
> against which a batch file can be created.  I won't go as far as saying 

Well, what did you do instead?

> that the feature is useless, but just caution people that they need to 
> understand the assumptions that this use of rsync is based upon.  Also, 
> I would suggest checking out other diff/patch tools such as rdiff-backup 
> or xdelta.

I looked at theses but didn't see how they could help me in my 
situation (same as what you described).  Am I missing something?
> 
> > BTW, there is a work-around.  If you don't mind duplicating the 
> > mirror
> >twice, one solution is to do a regular (no --write-batch) rsync update of 
> >one
> >copy of the mirror, and then do the --write-batch during a local to local
> >rsync update of another copy of the mirror.  Actually, this has some real
> >advantages if your network connection is unreliable. 
> 
> This is really the only circumstance under which I would even consider 
> using batch mode.  There should also be safeguards built into the batch 
> mode operation to guarantee that the source files to which the batch is 
> applied are in the state we expect them to be.  I wouldn't otherwise 
> want rsync to touch my files.

Good point.

-chris

> 
> > Thanks for your input.
> 
> Likewise.  Good luck...
> 
> -- Alberto
> 
> 
> 
> Alberto Accomazzi  aaccomazzi(at)cfa harvard edu
> NASA Astrophysics Data Systemads.harvard.edu
> Harvard-Smithsonian Center for Astrophysics  www.cfa.harvard.edu
> 60 Garden St, MS 31, Cambridge, MA 02138, USA
> 
> 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

2004-05-18 Thread Chris Shoemaker

On Mon, May 17, 2004 at 09:42:18PM -0700, Jos Backus wrote:
> On Mon, May 17, 2004 at 05:18:10PM -0400, Chris Shoemaker wrote:
> > BTW, there is a work-around.  If you don't mind duplicating the mirror
> > twice, one solution is to do a regular (no --write-batch) rsync update of one
> > copy of the mirror, and then do the --write-batch during a local to local
> > rsync update of another copy of the mirror.  Actually, this has some real
> > advantages if your network connection is unreliable. 
> 
> That was in fact the way I envisioned us using it at work, together with
> multicast-based batch file distribution (but the project never got off the
> ground).
> 

Nice to know I'm not alone :-)  Thanks for the report.

-chris

> -- 
> Jos Backus   _/  _/_/_/  Sunnyvale, CA
> _/  _/   _/
>_/  _/_/_/
>   _/  _/  _/_/
> jos at catnook.com_/_/   _/_/_/  require 'std/disclaimer'
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

2004-05-18 Thread Chris Shoemaker

On Mon, May 17, 2004 at 08:10:57PM -0700, Wayne Davison wrote:
> On Mon, May 17, 2004 at 05:18:10PM -0400, Chris Shoemaker wrote:
> > The "knowledge" or "memory" of that exact state is more likely to
> > reside with the receiver (who just left that state) than with the
> > sender (who may never have been in that state).  Therefore it is more
> > likely to be useful to the receiver than to sender.
> 
> This is only true if you imagine a receiver doing one pull and then
> forwarding the update on to multiple hosts.  For instance, if you
> use a pull to create the batch files and then make them available
> for people to download, which would help to alleviate load from the
> original server.  That said, I think most of the time a receiver is
> going to be a leaf node, so the server tends to be the place where
> a batch is more likely to be useful, IMO.

I can see the "push" pattern for creating batch sets, and I definitely agree
that receiver is likely to be a leaf node, but I'm submitting that on the "big
tree" the expectation of 1) finding another _identical_ leaf and 2) knowing
about that identity, is MUCH better the closer you are to that first leaf node
than _anywhere_ else, server/sender included.

I know there are counter-examples to my proposition -- I just don't think 
they're likely.  If they were, then there would be more people using and 
considering using batch-mode for the sender-side batch-write than people doing 
what I'm doing -- making two local mirrors just so I can be the sender for a 
write-batch.

I suppose there two theoretical explanations for what's going on.   Afterall, 
the two receivers are not identical by chance; they were made so, but how?

Case A) The destinations were created by pushing batch-sets from
a server and only ever modified by pushing batch-sets from a server.  The 
receivers are not necessarily "close" to each other with respect to any 
communication path.  The receivers are only "related" through the server.  In 
this scenario, batch-sets should be created by sender.

Case B) The destinations are identical because there are "close" 
with respect to some communications path and they were made identical.  E.g. 
one is a copy of the other, they are both copied from the same physical source 
media, they have agreed to syncronize to each other.  In this scenario, 
batch-sets belong with receiver.

I admit Case A probably really happens sometimes.  (I mean the
information transfer pattern; it sounds like rsync batch-mode maybe isn't
actually used for this purpose.)  But, I think that Case B must be much more
common. 

Of course, I don't have any real usage data to back this theory up, so
I could be full of it.  But afterall, isn't it intuitive that the _average_
"communications distance" between two _identical_ copies would much smaller
than the _average_ "communications distance" between two _similar_ copies that
want to syncronize?  On average.

> 
> In thinking about batch mode, it seems like its restrictions make
> it useful in only a very small set of of circumstances.  Since the
> receiving systems must all have identical starting hierarchies, it
> really does limit how often it can be used.

Well, yes.

> 
> I'm wondering if batch mode should be removed from the main rsync
> release and relegated to a parallel project?  It seems to me that a

I'd be sad to see batch-mode bitrot, but from a purely technical 
viewpoint, perhaps the (simple?) task of capturing an output stream of the 
protocol shouldn't be so strongly coupled to the rsync project.

> better feature for the mainstream utility would be something that
> optimized away some of the load on the sending system when it is
> serving lots of users.  So, having the ability to cache a directory
> tree's information, and the ability to cache checksums for files
> would be useful (especially if the data was auto-updated as it
> became stale).  That would make all transfers more optimal,
> regardless of what files the receiving system started from.

That's a very good idea.  Optimizing the common case makes sense.  
Cache invalidation could be hard, I think.  Something the FAM might be 
expensive.  Recaching on a signal is easy though.

> 
> Such a new feature would probably best be added to an rsync
> replacement project, though.

I don't know, it could be a simple performance enhancement with no new 
visible features.

-Chris
> 
> ..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

2004-05-17 Thread Chris Shoemaker

On Mon, May 17, 2004 at 10:15:23AM -0400, Alberto Accomazzi wrote:
> 
> Chris,
> 
> to put things in the right prespective, you should read (if you haven't 
> done so already) the original paper describing the design behind batch 
> mode.  The design and implementation of this functionality goes back to 
> a project called the Internet2 Distributed Storage Infrastructure 
> (I2-DSI).  As part of that project, the authors created a modified 
> version of rsync (called rsync+) which had the capability of creating 
> these batch sets for mirroring.  Here are a couple of URLs describing 
> the ideas and motivation behind it:
> http://www.ils.unc.edu/i2dsi/unc_rsync+.html
> http://www.ils.unc.edu/ils/research/reports/TR-1999-01.pdf

Ah, thank you.  I had seen the first, but not the second.  It was an
interesting read, and it explains a lot.  I see now why the write-batch hooks
are in the _sender_ paths.  This seems a reasonable design decision when the
intention is to replicate changes to many remote copies.
I can see some justification for wanting write-batch functionality
with both sender and receiver.  However, several things in the report seem to
confirm by growing opinion that, if it has to be in only one, receiver is
sufficient, while sender is not.

> >use.  I figure, there are three cases:
> >
> >   A) If you have access to both source and dest, it doesn't really matter 
> >   too
> >much who writes the batch -- this is like the local copy case.
> >   B) If you have access to the dest but not the source, then you need the
> >client to write the batch -- and it's not far-fetched that you might have
> >other copies of dest to update.
> >   C) However, having access to source but not dest is the only case that
> >_requires_ the sender to write the batch -- now what's the chance that 
> >you'll
> >have another identical dest to apply the batch to?  And if you did, why
> >wouldn't you generate the batch on that dest as in case A, above?
> >
> >   So, it seems to me that it's much more useful to have the 
> >   receiver/client write the batch than sender/client, or receiver/server, or 
> >sender/server.  But, maybe I'm just not appreciating what the potential 
> >uses of batch-mode are.
> >
> >  Survey: so who uses batch-mode and what for?
> 
> I haven't used the feature but back when I read the docs on rsync+ I 
> thought it was a clever way to do multicasting on the cheap.  I think 
> the only scenario where batch mode makes sense is when you need to 
> distribute updates from a particular archive to a (large) number of 
> mirror sites and you have tight control on the state of both client and 
> server (so that you know exactly what needs to be updated on the mirror 
> sites).  This ensures that you can create a set of batch files that 
> contain *all* the changes necessary for updating each mirror site.
> 
> So basically I would use batch mode if I had a situation in which:
> 
> 1) all mirror sites have the same set of files
> 2) rsync is invoked from each mirror site in exactly the same way (i.e. 
> same command-line options) to pull data from a master server
> 
> then instead of having N sites invoke rsync against the same archive, I 
> would invoke it once, make it write out a set of batch files, then 
> transfer the batch files to each client and run rsync locally using the 
> batch set.  The advantage of this is that the server only performs its 
> computations once.  An example of this usage would be using rsync to 
> upgrade a linux distribution, say going from FC 1 to FC 2.  All files 
> from each distribution are frozen, so you should be able to create a 
> single batch which incorporates all the changes and then apply that on 
> each site carrying the distro.

Indeed, what you describe seems to have been the design motivation.  I
can share what my desired application is: I want to create a mirror of a
public server onto my local machine which physically disconnected from the
Internet, and keep it current.  So, I intend to first rsync update my own copy
which _is_ networked while creating the batch set.  Then I can sneakernet the
batch set to the unnetworked machine and use rsync --read-batch to update it. 
This keeps the batch sets smallish even though the mirror is largish. 

> 
> The question of whether the batch files should be on the client or 
> server side is not easy to answer and in the end depends on exactly what 
> you're trying to do.  In general, I would say that since the contents of 
> the batch mode depend on the status of both client and server, there is 
> not a "natural" location for it.

While I agree there is some symmetry in the _origin_ of the batch set
that would suggest that there is no natural location for it, I think the
_intended use_ of the batch set strongly suggests that it will usually belong
with the _receiver_ (irrespective of client/server).  Specifically, the batch
set is only useful for other receivers that are identical to the

1 2 >

1 - 100 of 107 matches

Mail list logo