Re: rsync 2.5.6 hanging

2003-11-17 Thread Andrew J. Schorr
Hi Zach,

On Mon, Nov 17, 2003 at 09:24:12AM -0800, Zachary Denison wrote:
 The version I am running on the destination machine is
 also rsync 2.5.6.  The destination machine has 4GB ram
 in it and is running redhat 8.0.  Also it gets stuck
 on all different types of files, small and large.
 sometimes the filesize is 200k and sometimes its
 several megabytes.

Have you tried the CVS version?  My impression is that many bugs
have been patched.  If you can install the CVS code on both source
and destination, that may be worth a try.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Feature Request - Recursive Rsync Parameter - Example Script

2003-10-22 Thread Andrew J. Schorr
On Tue, Oct 21, 2003 at 06:46:16PM -0700, jw schultz wrote:
 
 Limiting the depth of recursion is already supported just
 not intuitive.
 
   rsync -r --exclude='/*/*/*/'
 
 Your idea for a shell script to automate picking up the
 lower levels is good and could compose the --exclude
 pattern.  The next step would be to set the job partition
 level based on path count as in find $subtree -print|wc -l.
 

I have used this technique myself to limit the number of files
processed in a single rsync invocation.  If you use find to locate
the files that you need to process, you can then use the --files-from
option to process a certain number of those files at a time.  This
works like a charm.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: EOF error at io.c line 165

2003-07-30 Thread Andrew J. Schorr
Hi Thomas,

On Wed, Jul 30, 2003 at 12:33:45AM -0400, Thomas Cort wrote:
 I keep getting rsync: connection unexpectedly closed (1074142 bytes read so far)
 rsync error: error in rsync protocol data stream (code 12) at io.c(165). Does
 anyone have any idea of what could cause this? I've gotten the same error on
 two seperate machines. One UltraSparc running Linux and the other a 
 SparcStation running OpenBSD. We have an x86-SMP system running linux on the
 same network, and it works fine.

Are you using the -z flag for compression?  If so, are you using
the latest CVS version?  There was a bug in token.c that was recently
patched that can cause similar problems.  For further info, go to
   http://www.mail-archive.com/[EMAIL PROTECTED]
and search for Masahiko.  If your error is the same, then the patch that
has already been put into CVS should fix the problem.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync error: error in rsync protocol data stream (Broken pipe)

2003-07-10 Thread Andrew J. Schorr
Just for the record, the patch to token.c for the -z bug that was
discovered by Yasuoka Masahiko and patched by him and Wayne Davison
has fixed the problem that I reported here:

   http://www.mail-archive.com/[EMAIL PROTECTED]/msg07289.html

Thanks guys, this bug has been biting me for the past 6 months...

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync error: error in rsync protocol data stream (Broken pipe)

2003-06-18 Thread Andrew J. Schorr
On Tue, Jun 17, 2003 at 06:35:50AM -0700, jw schultz wrote:
 
 You could try turning on transfer logging i suppose.  If you
 haven't already done so you might want to use the log file
 option in case chroot is getting in the way.  Beyond this i
 have no suggestions; i dont use rsyncd.

I may be having a similar problem.  I'm using rsync version 2.5.6cvs-20030205
every night on Solaris 8/x86 to do a backup from a client system to a 
backup server using rsyncd.  Almost every night I see the following errors
logged on the rsyncd server:

Jun 18 00:48:18 ead62 rsyncd[9632]: [ID 702911 daemon.warning] inflate returned -3 (0 
bytes)
Jun 18 00:48:18 ead62 rsyncd[9632]: [ID 702911 daemon.warning] rsync error: error in 
rsync protocol data stream (code 12) at ../token.c(416)
Jun 18 00:48:18 ead62 rsyncd[9632]: [ID 702911 daemon.warning] rsync: connection 
unexpectedly closed (197041 bytes read so far)
Jun 18 00:48:18 ead62 rsyncd[9632]: [ID 702911 daemon.warning] rsync error: error in 
rsync protocol data stream (code 12) at ../io.c(187)

And on the client side, I see the following error:

rsync: writefd_unbuffered failed to write 42 bytes: phase unknown: Broken pipe
rsync error: error in rsync protocol data stream (code 12) at ../io.c(622)

This has been happening to me for months almost every night.  I find
that I can check the return code from rsync to see whether the
transfer succeeded.  If it failed, I simply try again, and it almost
always finishes the backup successfully on the second invocation.  But if
not I keep retrying up to 10 times.  I think one evening it took 9 attempts,
but most of the time it works in 2 tries.  It's a hack, but it gets
the job done, and I was having no luck debugging the problem.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: rsync 2.5.6 still hangs

2003-03-20 Thread Andrew J. Schorr
Steve,

I have a couple of comments:

1. In rsync 2.5.5, I found that the -vvv flag can cause repeatable
   rsync hangs.  Since I was using it to debug a real problem, it
   was very confusing and misleading.  But in the end, I determined
   that -vvv itself was often the culprit.  So I caution
   you not to trust what you see when you debug with -vvv: the
   results may have nothing to do with the real problem that caused
   you to look more closely with -vvv in the first place.
   (I'm assuming that the same -vvv problems still exist in 2.5.6,
   although I've never tested.)

2. I have a nightly cron job to sync up 2 servers, and rsync gives
   me the following error (or something similar) every night (this is
   the reason I was debugging with -vvv):

rsync: writefd_unbuffered failed to write 174 bytes: phase unknown: Broken pipe
rsync error: error in rsync protocol data stream (code 12) at ../io.c(622)

   This is when connecting to rsync in server mode (started by inetd).
   Previously, I was running rsync over ssh and it would hang instead
   of exiting with an error.

   My solution to the problem has been to test the return code from
   rsync and simply rerun the exact same command when I get an error
   return from rsync.  Every night I find that the first attempt
   fails after a while, but the second one always works.  I have no idea
   why this happens, but this solution works for me.

Good luck,
Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html


Re: rsync and timestamps of local files

2003-03-10 Thread Andrew J. Schorr
On Sat, Mar 08, 2003 at 01:13:17PM -0500, Haisam K. Ido wrote:
 Is there a way to make rsync check the local file system for changes in the files 
 prior to it performing a diff with the remote site?

There is no built-in capability to do this in rsync.   However, you can
implement this yourself.  For example, if you touch a timestamp file
before invoking rsync, then you can use find -newer timestamp to
find files that have changed since the last time you ran rsync.

Then, you can use the --files-from patch to feed the specific list
of files to transfer into rsync.  That patch is available here:

   http://www.clari.net/~wayne/rsync-files-from.patch

Please search the archives for files-from to get more info
on what this patch does.

Cheers,
Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html


Re: Filelist caching

2003-02-18 Thread Andrew J. Schorr
Hi Rogier,

On Sat, Feb 15, 2003 at 05:05:16PM +0100, Rogier van Eeten wrote:
 On Wed, Feb 12, 2003 at 05:18:11PM -0500, Andrew J. Schorr wrote:
  On Wed, Feb 12, 2003 at 10:51:19AM -0500, Andrew J. Schorr wrote:
I was wondering... is there a way to cache that filelist? Our mirrors
are updated once, or twice a day, it could speed up downloads when I
create a filelist everytime we've mirrored others.
   
   Please take a look at the --files-from feature that is now in the CVS tree,
   courtesy of Wayne Davison.
 http://www.clari.net/~wayne/rsync-files-from.patch
 
 How does it work? What kind of list does it want? And how do I use it as
 a server?

Please refer to this archived message for more info:

   http://marc.theaimsgroup.com/?l=rsyncm=104286019712633w=2

As explained in that post, if the filename argument to --files-from has
a host: prefix, then the list will be pulled from the server.

I hope that helps.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Filelist caching

2003-02-12 Thread Andrew J. Schorr
Hi Rogier,

 I've noticed every time someone does an rsync-request on my ftp-site
 (which also provides rsync as mirror method), rsyncd creates a filelist.
 This is a quite IO and CPU intensive procedure, especially for things
 mirrors like FreeBSD with lots of little files.
 
 I was wondering... is there a way to cache that filelist? Our mirrors
 are updated once, or twice a day, it could speed up downloads when I
 create a filelist everytime we've mirrored others.

Please take a look at the --files-from feature that is now in the CVS tree,
courtesy of Wayne Davison.  That should do what you want.  It allows you to
create a set list of files and save rsync the work of scanning the directory
tree each time.  Of course, rsync still creates a filelist, but it doesn't
have to recurse over the directory tree, so it should be much faster.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Filelist caching

2003-02-12 Thread Andrew J. Schorr
On Wed, Feb 12, 2003 at 10:51:19AM -0500, Andrew J. Schorr wrote:
  I've noticed every time someone does an rsync-request on my ftp-site
  (which also provides rsync as mirror method), rsyncd creates a filelist.
  This is a quite IO and CPU intensive procedure, especially for things
  mirrors like FreeBSD with lots of little files.
  
  I was wondering... is there a way to cache that filelist? Our mirrors
  are updated once, or twice a day, it could speed up downloads when I
  create a filelist everytime we've mirrored others.
 
 Please take a look at the --files-from feature that is now in the CVS tree,
 courtesy of Wayne Davison.  That should do what you want.  It allows you to
 create a set list of files and save rsync the work of scanning the directory
 tree each time.  Of course, rsync still creates a filelist, but it doesn't
 have to recurse over the directory tree, so it should be much faster.

Oops, you're right.  I guess I applied the patch to my CVS tree and then
forgot that I had.  Sorry about that.  You can grab the files-from patch
from here:

   http://www.clari.net/~wayne/rsync-files-from.patch

It seems to apply cleanly to the current CVS tree.

Good luck,
Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Andrew J. Schorr
On Thu, Jan 16, 2003 at 01:58:49PM -0800, jw schultz wrote:
 On Thu, Jan 16, 2003 at 02:52:38PM -0600, Dave Dykstra wrote:
  Also, if the transfer is being sent from the remote side, the file names
  are all getting sent over to the remote side first for --files-from and
  then sent back as part of the normal protocol, right?  I had hoped we'd
  be able to avoid that round trip because the list could get long.
 
 I don't think we can avoid the round trip without changing
 rsync drastically.  Just consider that this saves on sending
 voluminous --include lists or invoking rsync hundreds of
 times.

Perhaps I'm misunderstanding this completely, but is there a possible
scenario where the remote (sender) might have a list of files on
his side?  I might be mistaken, but it seems that the current patch
supports the case where the --files-from file list is located on
the local side, and it must be transmitted to the sender in the
case where the sender is remote.  Is there a possible case where
the --files-from file list lives on the remote (sender) side?

This is not relevant to me, but I'm just wondering.  One might
imagine a scenario in which a bunch of sites are mirroring from
a server, and the server runs a job to create a list of modified
files that the remote mirrors should pull for updating...

Come to think of it, if the data lives on the remote server, where
would a local files-from list come from?  How would it be generated?

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Andrew J. Schorr
On Thu, Jan 16, 2003 at 05:07:13PM -0800, Wayne Davison wrote:
 (I assume you're talking about when using -R, which is not currently on
 by default.)  I believe that we do need an auto-creation mode and also a
 way to optimize the transfer to avoid this (since it results in a lot of
 extra directory checks that can slow things down when we know that they
 aren't needed).  Which one is the default is the current question.  I'm
 currently leaning toward going back to sending the implied dirs by
 default, and having an option for people to optimize the transfer (which
 would allow it to be used with the normal -R mode even when --from-files
 was not used).

This is why my patch included the --no-implicit-dirs and --send-dirs
options.  These allow you to specify precisely what you want without
having to remember which behavior is implied by various other options...

Although I suppose that those two options could be combined.  The basic
idea is that if the user is taking the responsibility for specifying
the directories himself (--send-dirs), then rsync doesn't need to
worry about automatically sending all the parent directories
(--no-implicit-dirs).

So perhaps an --explicit-dirs option (combining the two meanings) could
be used to indicate that the user is taking all responsibility for
sending directories and rsync doesn't need to worry about it.  And
to be safe, a --no-explicit-dirs to turn off this behavior.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: specifying a list of files to transfer

2003-01-17 Thread Andrew J. Schorr
On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote:
 I know i'm not talking about when -R is used.  I am talking
 about creating implied intermediate directories without -R.
 I'm talking about being able to take the output of
 find -name '*.jpg' and have it create (if necessary) any
 intermediate directories while maintaining the equivalency
 of src and dest.  If that means also behaving as though
 those directories were already in the list that would be OK
 as long as -r weren't specified.
 
   find . -name '*.jpg' | rsync -a --files-from=- .  remote:
 should when it hits
   ./deltapics/031CGMUa.jpg
   ./deltapics/031CGNga.jpg
   ./deltapics/031CGOHa.jpg
   ./deltapics/031CGPOa.jpg
   ./deltapics/031CGPba.jpg
 create the deltapics directory if it doesn't exist.  The
 permissions and ownership should be derived from the source.
 so effectively it should be as though
   ./deltapics
 where in the file list.  It needn't be updated if it
 does exist but if easier to implement it that way i wouldn't
 object.  In such a case even if -r is
 allowed and specified the implied directory should not defeat
 the the file list by transferring any files not in the list.
 
 No errors, no need to do a run to find the missing
 directories and add them and no need to add a filter to the
 stream adding entries for directories that are missing.

There are performance issues associated with sending all the
parent directories automatically.  Consider the situation where
running find test -name *.jpg -print gives the following results
(and yes, this does happen, at least for me on solaris 8 where the
output of find seems to depend on the order in which the directory
entries were created):

   test/foo.jpg
   test/bar.jpg
   test/sub/foo.jpg
   test/zeke.jpg

If I run rsync in such a way that parent directories are sent automatically,
it will send the following files (based on -vvv output):

   make_file(4,test)
   make_file(4,test/foo.jpg)
   make_file(4,test/bar.jpg)
   make_file(4,test)
   make_file(4,test/sub)
   make_file(4,test/sub/foo.jpg)
   make_file(4,test)
   make_file(4,test/zeke.jpg)

Note that the test directory is sent 3 times in this case.  This is
because the code that checks whether to send the directory just compares
to the last one sent in an attempt to eliminate duplicates.  But this
is not a reliable way of preventing duplicates, as the above example
demonstrates.  So there is a danger of sending lots of duplicate
directory entries when the automatic directory transmission feature
is enabled.

This could probably be fixed by keeping a hash table of all the directory
entries that have already been transmitted instead of just comparing
against the last one sent.

In any case, I think it's important to be able to turn off the
automatic directory sending feature so that situations that don't
require this can avoid the performance hit.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Latest --files-from patch

2003-01-16 Thread Andrew J. Schorr
Hi Wayne,

I ran a simple test of your patch at

  http://www.clari.net/~wayne/rsync-files-from.patch

and it worked fine for me.  The performance was just about the
same as for my --source-list patch.

Thanks,
Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync hanging with openssh-2.9.9p2 as the transport

2003-01-15 Thread Andrew J. Schorr
On Wed, Jan 15, 2003 at 09:27:37AM -0600, Dave Dykstra wrote:
 Unfortunately I don't think anybody is going to be able to tell you.
 I've not heard of anybody lately posting a similar problem.  In the
 past hanging problems have been traced to many different sources.
 Rsync stresses network (and filesystem) implementations greatly, and
 combining it with ssh stresses things that much more.  I think it's
 worth a try to use openssh 3.1 or 3.2 (I've not been happy with versions
 after 3.2).  What's the network between the local and remote machines?
 Does the name /nfs/mirror imply that the files are not directly mounted
 on remote but are instead on an NFS server?  That has been the cause
 of many problems in the past.

Actually, it turns out that my testing on this issue has been invalid
since I've been using the -vvv flag to rsync to get a grip on where
it's hanging.  But I just discovered that the -vvv flag itself seems
to cause rsync to hang.  So all the testing I've done so far is useless,
so I'm not sure whether this is related to openssh or the slow connection
or what.  I know that the problem exists in 2.5.5, but I need to do more
testing without the -vvv flag to figure out what's causing it.  Is it
well-known that the -vvv flag can cause rsync to hang?

Thanks,
Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



specifying a list of files to transfer

2003-01-14 Thread Andrew J. Schorr
Hi,

I don't want to start another --files-from war, but I am attaching
an updated version of my patch to allow you to specify a list
of files to transfer.  The normal rsync syntax allows you to specify
a list of SRC files to transfer on the command line.  This patch
adds some new options to allow you to instead supply a file that
contains a list of files to transfer.

The previous version of the patch was against rsync-2.4.6; this version
works for rsync-2.5.5.  The only real changes relate to the use of
the popt option parsing library in 2.5 (not used in 2.4).  This had
the minor effect of removing the possibility of using - to indicate
stdin since the new library seems to interpret this as an option and
barfs.  So instead I allow the use of /dev/stdin.

By the way, this patch should also work against rsync-2.5.6pre1 except
for a couple of changes relating to white space and comments.  So a couple
of patch hunks are rejected but are easy to fix by hand.  If there is a
need, I can post an updated patch.

Last time we discussed this, Dave Dykstra objected to this patch
for two reasons:

   1. This patch only works in a single direction: when sending from a local
  system to a remote system.  It does not handle the case where you
  are receiving from a remote system to a local system.
   
   2. This capability is possible to achieve by specifying a list
  of files with --include-from and then adding --exclude '*' to
  ignore other files.  While this is true, it turns out to be
  much slower.  I have finally run a performance test to demonstrate
  this.  Results are below.

The basic idea of the patch is to handle the case where you already know
a list of files that might need to be updated and don't want to use
rsync's recursive directory tree scanning logic to enumerate all files.
The patch adds the following options:

 --source-list   SRC arg will be a (local) file name containing a list of 
files, or /dev/stdin
 --null  used with --source-list to indicate that the file names 
will be separated by null (zero) bytes instead of linefeed characters; useful with 
gfind -print0
 --send-dirs send directory entries even though not in recursive mode
 --no-implicit-dirs  do not send implicit directories (parents of the file 
being sent)

The --source-list option allows you to supply an explicit list of filenames
to transport without using the --recursive feature and without playing
around with include and exclude files.  As discussed below, the same
thing can be done by combining --recursive with --include-from and --exclude,
but it's significantly slower and more arcane to do it that way.

The --null flag allows you to handle files with embedded linefeeds.  This
is in the style of gnu find's -print0 operator.

The --send-dirs overcomes a problem where rsync refuses to send directories
unless it's in recursive mode.  One needs this to make sure that even
empty directories get mirrored.

And the --no-implicit-dirs option turns off the default behavior in which
all the parent directories of a file are transmitted before sending the
file.  That default behavior is very inefficient in my scenario where I
am taking the responsibility for sending those directories myself.

And now for a performance test:

I have a directory tree containing 128219 files of which 16064 are
directories.

To start the test, I made a list of files that had changed in the
past day:

   find . -mtime -1 -print  /tmp/changed

(normally, my list of candidate files is generated by some other means,
this is just a test example).  There were 5059 entries in /tmp/changed.

I used my new options to sync up these files to another host
as follows:

  time rsync -RlHptgoD --numeric-ids --source-list \
--send-dirs --no-implicit-dirs -xz --stats /dev/stdin \
remotehost:/extra_disk/tmp/tree1  /tmp/changed

Here were the reported statistics:

 Number of files: 5059
 Number of files transferred: 5056
 Total file size: 355514100 bytes
 Total transferred file size: 355514100 bytes
 Literal data: 355514100 bytes
 Matched data: 0 bytes
 File list size: 139687
 Total bytes written: 154858363
 Total bytes read: 80916

 wrote 154858363 bytes  read 80916 bytes  364992.41 bytes/sec
 total size is 355514100  speedup is 2.29

And the time statistics:

  112.53u 8.82s 7:03.92 28.6%

I then ran the same command again (in which case there was nothing to
transfer).  Here's how long it took:

0.54u 0.62s 0:08.61 13.4%

Now to compare with the recursive method using --include-from.  First, we
must create the list of files.  In the case of include-from, we need to
include all the parent directories as include patterns.  The following
gawk seems to do the job:

  gawk '$0 != ./ {sub(/^\.\//,)} {while ((length  0)  !($0 in already)) 
{print /$0; already[$0] = 1; 

Re: specifying a list of files to transfer

2003-01-14 Thread Andrew J. Schorr
On Tue, Jan 14, 2003 at 03:32:41PM -0600, Dave Dykstra wrote:
 I haven't looked at the implementation, but comments on the user
 interface:
 1. Yes it should take a filename or - as a parameter.
 2. I don't like the idea of skipping the SRC spec.  Paths should be
   relative to the SRC.  If somebody wants to use full paths they
   can always have a SRC of /.
 3. It should be called --files-from.
 4. --send-dirs and --no-implicit-dirs shouldn't be separate options,
   they should be automatically turned on with the --files-from option.

Those comments all sound reasonable to me.  The only reason I broke
out the --send-dirs and --no-implicit-dirs options was because they
were orthogonal to what I was doing and could potentially also apply
to situations where the user was specifing various SRC filenames
on the command line.  But it's certainly fine to have --files-from turn
those on automatically.

-Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



rsync hanging with openssh-2.9.9p2 as the transport

2003-01-14 Thread Andrew J. Schorr
Hi,

This is perhaps a stupid question.  I apologize in advance if it's already
been covered, but I'm stumped...

I'm using rsync to backup some file systems to a remote host.
The transport is openssh-2.9.9p2.

The problem does not occur when the transport is rsh.

I'm invoking rsync as follows:

  rsync -RlHptgoD --numeric-ids -e ssh -rzx --stats . remotehost:/nfs/mirror

But it hangs when there are large numbers of files in the . filesystem.
If I run with -vvv, I see lots of make_file entries as send_file_list
starts sending the list of files to the remote host, but then it freezes
after around 4500 to 6500 make_file messages.

I see the same problem with versions 2.4.6, 2.5.5, and 2.5.6pre1.
And I see it on linux 2.4.18 (red hat 8.0) and solaris 8/x86.

I can get it to work by breaking it up into chunks of 1000 files or
so.  It might work with more, I haven't tested exhaustively.

I have tried using the --blocking-io and --no-blocking-io options, but
neither one solves the problem.

I could provide more info about where it hangs, but I thought somebody
might know the answer, since this is clearly related to the
interaction with openssh (since there's no problem with rsh).

Is there a trick to make rsync work nicely with openssh?  I searched the
archives and haven't found anything...  Does upgrading to openssh 3 solve
the problem?

Thanks,
Andy
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: patch to enable faster mirroring of large filesystems

2001-11-26 Thread Andrew J. Schorr

On Sun, Nov 25, 2001 at 03:21:51AM +1100, Martin Pool wrote:
 On 20 Nov 2001, Dave Dykstra [EMAIL PROTECTED] wrote:
 
   And, by the way, even if the batch stuff accomplishes the same performance
   gains, I would still argue that the --files-from type of behavior
   that I implemented is a nice transparent interface that people might
   like to have.  The ability to pipe in output from gfind -print0 opens
   up some possibilities.
  
  Yes, many people have argued that a --files-from type of option is
  desirable and I agree.
 
 I agree too.  I think there would certainly be no argument with taking
 a patch that did only that.  (Except that it makes the option list
 even more ridiculously bloated.)
 
 I think a better fix is to transfer directories one by one, pipelined,
 rather than walking the whole tree upfront.  This will take a protocol
 change and a fair amount of work, but I think it is possible.  If we
 can get it to just work faster transparently that is better.

I understand your point of view, but I think it is a mistake to
hold rsync's algorithm hostage to the directory tree traversal logic
built into the program.  IMHO, the basic file transfer algorithm of
rsync is terrific, but the program wrapped around it is a bit out of
control.

The spirit of my patch is to expose the low-level rsync algorithm and
to allow people to build up their customized infrastructure outside
of the program instead of having to build it in.  I think this is in
the spirit of Unix tools.  I think if rsync were to expose some of its
low-level capabilities, then we would not have a need for xdelta and rdiff,
projects which are springing up because of rsync's opaqueness.

Anyway, you may not like the way my patch is implemented, but I still argue
that it serves a useful purpose, and it gets the job done for me.

Cheers,
Andy




patch to enable faster mirroring of large filesystems

2001-11-19 Thread Andrew J. Schorr

I have attached a patch that adds 4 options to rsync that have helped
me to speed up my mirroring.  I hope this is useful to someone else,
but I fear that my relative inexperience with rsync has caused me to
miss a way to do what I want without having to patch the code.  So please
let me know if I'm all wet.

Here's my story: I have a large filesystem (around 20 gigabytes of data)
that I'm mirroring over a T1 link to a backup site.  Each night, 
about 600 megabytes of data needs to be transferred to the backup site.
Much of this data has been appended to the end of various existing files,
so a tool like rsync that sends partial updates instead of the whole
file is appropriate.

Normally, one could just use rsync with the --recursive and --delete features
to do this.  However, this takes a lot more time than necessary, basically
because rsync spends a lot of time walking through the directory tree
(which contains over 300,000 files).

One can speed this up by caching a listing of the directory tree.  I maintain
an additional state file at the backup site that contains a listing
of the state of the tree after the last backup operation.  This is essentially
equivalent to saving the output of find . -ls in a file.

Then, the next night, one generates the updated directory tree for the source
file system and does a diff with the directory listing on the backup file
system to find out what has changed.  This seems to be much faster than
using rsync's recursive and delete features.

I have my own script and programs to delete any files that have been removed,
and then I just need to update the files that have been added or changed.
One could use cpio for this, but it's too slow when only partial files
have changed.

So I added the following options to rsync:

 --source-list   SRC arg will be a (local) file name containing a list of 
files, or - to read file names from stdin
 --null  used with --source-list to indicate that the file names 
will be separated by null (zero) bytes instead of linefeed characters; useful with 
gfind -print0
 --send-dirs send directory entries even though not in recursive mode
 --no-implicit-dirs  do not send implicit directories (parents of the file 
being sent)

The --source-list option allows me to supply an explicit list of filenames
to transport without using the --recursive feature and without playing
around with include and exclude files.  I'm not really clear on whether
the include and exclude files could have gotten me the same place, but it
seems to me that they work hand-in-hand with the --recursive feature that
I don't want to use.

The --null flag allows me to handle files with embedded linefeeds.  This
is in the style of gnu find's -print0 operator.

The --send-dirs overcomes a problem where rsync refuses to send directories
unless it's in recursive mode.  One needs this to make sure that even
empty directories get mirrored.

And the --no-implicit-dirs option turns off the default behavior in which
all the parent directories of a file are transmitted before sending the
file.  That default behavior is very inefficient in my scenario where I
am taking the responsibility for sending those directories myself.

So, the patch is attached.  If you think it's an abomination, please let
me know what the better solution is.  If you would like some elaboration
on how this stuff really works, please let me know.

Cheers,
Andy


--- flist.c.origTue Sep  5 22:46:43 2000
+++ flist.c Fri Nov  9 12:01:56 2001
@@ -30,6 +30,7 @@
 extern int cvs_exclude;
 
 extern int recurse;
+extern int send_dirs;
 
 extern int one_file_system;
 extern int make_backups;
@@ -501,8 +502,8 @@
/* we use noexcludes from backup.c */
if (noexcludes) goto skip_excludes;
 
-   if (S_ISDIR(st.st_mode)  !recurse) {
-   rprintf(FINFO,skipping directory %s\n,fname);
+   if (S_ISDIR(st.st_mode)  !recurse  !send_dirs) {
+   rprintf(FINFO,make_file: skipping directory %s\n,fname);
return NULL;
}

@@ -689,14 +690,16 @@
 }
 
 
-struct file_list *send_file_list(int f,int argc,char *argv[])
+static struct file_list *send_file_list_proc(int f,char *(*ffunc)(), void *opq)
 {
-   int i,l;
+   int l;
STRUCT_STAT st;
char *p,*dir,*olddir;
char lastpath[MAXPATHLEN]=;
struct file_list *flist;
int64 start_write;
+   char *in_fn;
+   extern int implicit_dirs;
 
if (verbose  recurse  !am_server  f != -1) {
rprintf(FINFO,building file list ... );
@@ -711,10 +714,10 @@
io_start_buffering(f);
}
 
-   for (i=0;iargc;i++) {
+   while ((in_fn = (*ffunc)(opq)) != NULL) {
char *fname = topsrcname;
 
-   strlcpy(fname,argv[i],MAXPATHLEN);
+   strlcpy(fname,in_fn,MAXPATHLEN);
 
l = strlen(fname);
if (l != 1  fname[l-1]