Re: rsync 2.5.6 hanging
Hi Zach, On Mon, Nov 17, 2003 at 09:24:12AM -0800, Zachary Denison wrote: The version I am running on the destination machine is also rsync 2.5.6. The destination machine has 4GB ram in it and is running redhat 8.0. Also it gets stuck on all different types of files, small and large. sometimes the filesize is 200k and sometimes its several megabytes. Have you tried the CVS version? My impression is that many bugs have been patched. If you can install the CVS code on both source and destination, that may be worth a try. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Feature Request - Recursive Rsync Parameter - Example Script
On Tue, Oct 21, 2003 at 06:46:16PM -0700, jw schultz wrote: Limiting the depth of recursion is already supported just not intuitive. rsync -r --exclude='/*/*/*/' Your idea for a shell script to automate picking up the lower levels is good and could compose the --exclude pattern. The next step would be to set the job partition level based on path count as in find $subtree -print|wc -l. I have used this technique myself to limit the number of files processed in a single rsync invocation. If you use find to locate the files that you need to process, you can then use the --files-from option to process a certain number of those files at a time. This works like a charm. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: EOF error at io.c line 165
Hi Thomas, On Wed, Jul 30, 2003 at 12:33:45AM -0400, Thomas Cort wrote: I keep getting rsync: connection unexpectedly closed (1074142 bytes read so far) rsync error: error in rsync protocol data stream (code 12) at io.c(165). Does anyone have any idea of what could cause this? I've gotten the same error on two seperate machines. One UltraSparc running Linux and the other a SparcStation running OpenBSD. We have an x86-SMP system running linux on the same network, and it works fine. Are you using the -z flag for compression? If so, are you using the latest CVS version? There was a bug in token.c that was recently patched that can cause similar problems. For further info, go to http://www.mail-archive.com/[EMAIL PROTECTED] and search for Masahiko. If your error is the same, then the patch that has already been put into CVS should fix the problem. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync error: error in rsync protocol data stream (Broken pipe)
Just for the record, the patch to token.c for the -z bug that was discovered by Yasuoka Masahiko and patched by him and Wayne Davison has fixed the problem that I reported here: http://www.mail-archive.com/[EMAIL PROTECTED]/msg07289.html Thanks guys, this bug has been biting me for the past 6 months... -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync error: error in rsync protocol data stream (Broken pipe)
On Tue, Jun 17, 2003 at 06:35:50AM -0700, jw schultz wrote: You could try turning on transfer logging i suppose. If you haven't already done so you might want to use the log file option in case chroot is getting in the way. Beyond this i have no suggestions; i dont use rsyncd. I may be having a similar problem. I'm using rsync version 2.5.6cvs-20030205 every night on Solaris 8/x86 to do a backup from a client system to a backup server using rsyncd. Almost every night I see the following errors logged on the rsyncd server: Jun 18 00:48:18 ead62 rsyncd[9632]: [ID 702911 daemon.warning] inflate returned -3 (0 bytes) Jun 18 00:48:18 ead62 rsyncd[9632]: [ID 702911 daemon.warning] rsync error: error in rsync protocol data stream (code 12) at ../token.c(416) Jun 18 00:48:18 ead62 rsyncd[9632]: [ID 702911 daemon.warning] rsync: connection unexpectedly closed (197041 bytes read so far) Jun 18 00:48:18 ead62 rsyncd[9632]: [ID 702911 daemon.warning] rsync error: error in rsync protocol data stream (code 12) at ../io.c(187) And on the client side, I see the following error: rsync: writefd_unbuffered failed to write 42 bytes: phase unknown: Broken pipe rsync error: error in rsync protocol data stream (code 12) at ../io.c(622) This has been happening to me for months almost every night. I find that I can check the return code from rsync to see whether the transfer succeeded. If it failed, I simply try again, and it almost always finishes the backup successfully on the second invocation. But if not I keep retrying up to 10 times. I think one evening it took 9 attempts, but most of the time it works in 2 tries. It's a hack, but it gets the job done, and I was having no luck debugging the problem. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync 2.5.6 still hangs
Steve, I have a couple of comments: 1. In rsync 2.5.5, I found that the -vvv flag can cause repeatable rsync hangs. Since I was using it to debug a real problem, it was very confusing and misleading. But in the end, I determined that -vvv itself was often the culprit. So I caution you not to trust what you see when you debug with -vvv: the results may have nothing to do with the real problem that caused you to look more closely with -vvv in the first place. (I'm assuming that the same -vvv problems still exist in 2.5.6, although I've never tested.) 2. I have a nightly cron job to sync up 2 servers, and rsync gives me the following error (or something similar) every night (this is the reason I was debugging with -vvv): rsync: writefd_unbuffered failed to write 174 bytes: phase unknown: Broken pipe rsync error: error in rsync protocol data stream (code 12) at ../io.c(622) This is when connecting to rsync in server mode (started by inetd). Previously, I was running rsync over ssh and it would hang instead of exiting with an error. My solution to the problem has been to test the return code from rsync and simply rerun the exact same command when I get an error return from rsync. Every night I find that the first attempt fails after a while, but the second one always works. I have no idea why this happens, but this solution works for me. Good luck, Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync and timestamps of local files
On Sat, Mar 08, 2003 at 01:13:17PM -0500, Haisam K. Ido wrote: Is there a way to make rsync check the local file system for changes in the files prior to it performing a diff with the remote site? There is no built-in capability to do this in rsync. However, you can implement this yourself. For example, if you touch a timestamp file before invoking rsync, then you can use find -newer timestamp to find files that have changed since the last time you ran rsync. Then, you can use the --files-from patch to feed the specific list of files to transfer into rsync. That patch is available here: http://www.clari.net/~wayne/rsync-files-from.patch Please search the archives for files-from to get more info on what this patch does. Cheers, Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Filelist caching
Hi Rogier, On Sat, Feb 15, 2003 at 05:05:16PM +0100, Rogier van Eeten wrote: On Wed, Feb 12, 2003 at 05:18:11PM -0500, Andrew J. Schorr wrote: On Wed, Feb 12, 2003 at 10:51:19AM -0500, Andrew J. Schorr wrote: I was wondering... is there a way to cache that filelist? Our mirrors are updated once, or twice a day, it could speed up downloads when I create a filelist everytime we've mirrored others. Please take a look at the --files-from feature that is now in the CVS tree, courtesy of Wayne Davison. http://www.clari.net/~wayne/rsync-files-from.patch How does it work? What kind of list does it want? And how do I use it as a server? Please refer to this archived message for more info: http://marc.theaimsgroup.com/?l=rsyncm=104286019712633w=2 As explained in that post, if the filename argument to --files-from has a host: prefix, then the list will be pulled from the server. I hope that helps. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Filelist caching
Hi Rogier, I've noticed every time someone does an rsync-request on my ftp-site (which also provides rsync as mirror method), rsyncd creates a filelist. This is a quite IO and CPU intensive procedure, especially for things mirrors like FreeBSD with lots of little files. I was wondering... is there a way to cache that filelist? Our mirrors are updated once, or twice a day, it could speed up downloads when I create a filelist everytime we've mirrored others. Please take a look at the --files-from feature that is now in the CVS tree, courtesy of Wayne Davison. That should do what you want. It allows you to create a set list of files and save rsync the work of scanning the directory tree each time. Of course, rsync still creates a filelist, but it doesn't have to recurse over the directory tree, so it should be much faster. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Filelist caching
On Wed, Feb 12, 2003 at 10:51:19AM -0500, Andrew J. Schorr wrote: I've noticed every time someone does an rsync-request on my ftp-site (which also provides rsync as mirror method), rsyncd creates a filelist. This is a quite IO and CPU intensive procedure, especially for things mirrors like FreeBSD with lots of little files. I was wondering... is there a way to cache that filelist? Our mirrors are updated once, or twice a day, it could speed up downloads when I create a filelist everytime we've mirrored others. Please take a look at the --files-from feature that is now in the CVS tree, courtesy of Wayne Davison. That should do what you want. It allows you to create a set list of files and save rsync the work of scanning the directory tree each time. Of course, rsync still creates a filelist, but it doesn't have to recurse over the directory tree, so it should be much faster. Oops, you're right. I guess I applied the patch to my CVS tree and then forgot that I had. Sorry about that. You can grab the files-from patch from here: http://www.clari.net/~wayne/rsync-files-from.patch It seems to apply cleanly to the current CVS tree. Good luck, Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Thu, Jan 16, 2003 at 01:58:49PM -0800, jw schultz wrote: On Thu, Jan 16, 2003 at 02:52:38PM -0600, Dave Dykstra wrote: Also, if the transfer is being sent from the remote side, the file names are all getting sent over to the remote side first for --files-from and then sent back as part of the normal protocol, right? I had hoped we'd be able to avoid that round trip because the list could get long. I don't think we can avoid the round trip without changing rsync drastically. Just consider that this saves on sending voluminous --include lists or invoking rsync hundreds of times. Perhaps I'm misunderstanding this completely, but is there a possible scenario where the remote (sender) might have a list of files on his side? I might be mistaken, but it seems that the current patch supports the case where the --files-from file list is located on the local side, and it must be transmitted to the sender in the case where the sender is remote. Is there a possible case where the --files-from file list lives on the remote (sender) side? This is not relevant to me, but I'm just wondering. One might imagine a scenario in which a bunch of sites are mirroring from a server, and the server runs a job to create a list of modified files that the remote mirrors should pull for updating... Come to think of it, if the data lives on the remote server, where would a local files-from list come from? How would it be generated? -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Thu, Jan 16, 2003 at 05:07:13PM -0800, Wayne Davison wrote: (I assume you're talking about when using -R, which is not currently on by default.) I believe that we do need an auto-creation mode and also a way to optimize the transfer to avoid this (since it results in a lot of extra directory checks that can slow things down when we know that they aren't needed). Which one is the default is the current question. I'm currently leaning toward going back to sending the implied dirs by default, and having an option for people to optimize the transfer (which would allow it to be used with the normal -R mode even when --from-files was not used). This is why my patch included the --no-implicit-dirs and --send-dirs options. These allow you to specify precisely what you want without having to remember which behavior is implied by various other options... Although I suppose that those two options could be combined. The basic idea is that if the user is taking the responsibility for specifying the directories himself (--send-dirs), then rsync doesn't need to worry about automatically sending all the parent directories (--no-implicit-dirs). So perhaps an --explicit-dirs option (combining the two meanings) could be used to indicate that the user is taking all responsibility for sending directories and rsync doesn't need to worry about it. And to be safe, a --no-explicit-dirs to turn off this behavior. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: specifying a list of files to transfer
On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote: I know i'm not talking about when -R is used. I am talking about creating implied intermediate directories without -R. I'm talking about being able to take the output of find -name '*.jpg' and have it create (if necessary) any intermediate directories while maintaining the equivalency of src and dest. If that means also behaving as though those directories were already in the list that would be OK as long as -r weren't specified. find . -name '*.jpg' | rsync -a --files-from=- . remote: should when it hits ./deltapics/031CGMUa.jpg ./deltapics/031CGNga.jpg ./deltapics/031CGOHa.jpg ./deltapics/031CGPOa.jpg ./deltapics/031CGPba.jpg create the deltapics directory if it doesn't exist. The permissions and ownership should be derived from the source. so effectively it should be as though ./deltapics where in the file list. It needn't be updated if it does exist but if easier to implement it that way i wouldn't object. In such a case even if -r is allowed and specified the implied directory should not defeat the the file list by transferring any files not in the list. No errors, no need to do a run to find the missing directories and add them and no need to add a filter to the stream adding entries for directories that are missing. There are performance issues associated with sending all the parent directories automatically. Consider the situation where running find test -name *.jpg -print gives the following results (and yes, this does happen, at least for me on solaris 8 where the output of find seems to depend on the order in which the directory entries were created): test/foo.jpg test/bar.jpg test/sub/foo.jpg test/zeke.jpg If I run rsync in such a way that parent directories are sent automatically, it will send the following files (based on -vvv output): make_file(4,test) make_file(4,test/foo.jpg) make_file(4,test/bar.jpg) make_file(4,test) make_file(4,test/sub) make_file(4,test/sub/foo.jpg) make_file(4,test) make_file(4,test/zeke.jpg) Note that the test directory is sent 3 times in this case. This is because the code that checks whether to send the directory just compares to the last one sent in an attempt to eliminate duplicates. But this is not a reliable way of preventing duplicates, as the above example demonstrates. So there is a danger of sending lots of duplicate directory entries when the automatic directory transmission feature is enabled. This could probably be fixed by keeping a hash table of all the directory entries that have already been transmitted instead of just comparing against the last one sent. In any case, I think it's important to be able to turn off the automatic directory sending feature so that situations that don't require this can avoid the performance hit. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Latest --files-from patch
Hi Wayne, I ran a simple test of your patch at http://www.clari.net/~wayne/rsync-files-from.patch and it worked fine for me. The performance was just about the same as for my --source-list patch. Thanks, Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync hanging with openssh-2.9.9p2 as the transport
On Wed, Jan 15, 2003 at 09:27:37AM -0600, Dave Dykstra wrote: Unfortunately I don't think anybody is going to be able to tell you. I've not heard of anybody lately posting a similar problem. In the past hanging problems have been traced to many different sources. Rsync stresses network (and filesystem) implementations greatly, and combining it with ssh stresses things that much more. I think it's worth a try to use openssh 3.1 or 3.2 (I've not been happy with versions after 3.2). What's the network between the local and remote machines? Does the name /nfs/mirror imply that the files are not directly mounted on remote but are instead on an NFS server? That has been the cause of many problems in the past. Actually, it turns out that my testing on this issue has been invalid since I've been using the -vvv flag to rsync to get a grip on where it's hanging. But I just discovered that the -vvv flag itself seems to cause rsync to hang. So all the testing I've done so far is useless, so I'm not sure whether this is related to openssh or the slow connection or what. I know that the problem exists in 2.5.5, but I need to do more testing without the -vvv flag to figure out what's causing it. Is it well-known that the -vvv flag can cause rsync to hang? Thanks, Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
specifying a list of files to transfer
Hi, I don't want to start another --files-from war, but I am attaching an updated version of my patch to allow you to specify a list of files to transfer. The normal rsync syntax allows you to specify a list of SRC files to transfer on the command line. This patch adds some new options to allow you to instead supply a file that contains a list of files to transfer. The previous version of the patch was against rsync-2.4.6; this version works for rsync-2.5.5. The only real changes relate to the use of the popt option parsing library in 2.5 (not used in 2.4). This had the minor effect of removing the possibility of using - to indicate stdin since the new library seems to interpret this as an option and barfs. So instead I allow the use of /dev/stdin. By the way, this patch should also work against rsync-2.5.6pre1 except for a couple of changes relating to white space and comments. So a couple of patch hunks are rejected but are easy to fix by hand. If there is a need, I can post an updated patch. Last time we discussed this, Dave Dykstra objected to this patch for two reasons: 1. This patch only works in a single direction: when sending from a local system to a remote system. It does not handle the case where you are receiving from a remote system to a local system. 2. This capability is possible to achieve by specifying a list of files with --include-from and then adding --exclude '*' to ignore other files. While this is true, it turns out to be much slower. I have finally run a performance test to demonstrate this. Results are below. The basic idea of the patch is to handle the case where you already know a list of files that might need to be updated and don't want to use rsync's recursive directory tree scanning logic to enumerate all files. The patch adds the following options: --source-list SRC arg will be a (local) file name containing a list of files, or /dev/stdin --null used with --source-list to indicate that the file names will be separated by null (zero) bytes instead of linefeed characters; useful with gfind -print0 --send-dirs send directory entries even though not in recursive mode --no-implicit-dirs do not send implicit directories (parents of the file being sent) The --source-list option allows you to supply an explicit list of filenames to transport without using the --recursive feature and without playing around with include and exclude files. As discussed below, the same thing can be done by combining --recursive with --include-from and --exclude, but it's significantly slower and more arcane to do it that way. The --null flag allows you to handle files with embedded linefeeds. This is in the style of gnu find's -print0 operator. The --send-dirs overcomes a problem where rsync refuses to send directories unless it's in recursive mode. One needs this to make sure that even empty directories get mirrored. And the --no-implicit-dirs option turns off the default behavior in which all the parent directories of a file are transmitted before sending the file. That default behavior is very inefficient in my scenario where I am taking the responsibility for sending those directories myself. And now for a performance test: I have a directory tree containing 128219 files of which 16064 are directories. To start the test, I made a list of files that had changed in the past day: find . -mtime -1 -print /tmp/changed (normally, my list of candidate files is generated by some other means, this is just a test example). There were 5059 entries in /tmp/changed. I used my new options to sync up these files to another host as follows: time rsync -RlHptgoD --numeric-ids --source-list \ --send-dirs --no-implicit-dirs -xz --stats /dev/stdin \ remotehost:/extra_disk/tmp/tree1 /tmp/changed Here were the reported statistics: Number of files: 5059 Number of files transferred: 5056 Total file size: 355514100 bytes Total transferred file size: 355514100 bytes Literal data: 355514100 bytes Matched data: 0 bytes File list size: 139687 Total bytes written: 154858363 Total bytes read: 80916 wrote 154858363 bytes read 80916 bytes 364992.41 bytes/sec total size is 355514100 speedup is 2.29 And the time statistics: 112.53u 8.82s 7:03.92 28.6% I then ran the same command again (in which case there was nothing to transfer). Here's how long it took: 0.54u 0.62s 0:08.61 13.4% Now to compare with the recursive method using --include-from. First, we must create the list of files. In the case of include-from, we need to include all the parent directories as include patterns. The following gawk seems to do the job: gawk '$0 != ./ {sub(/^\.\//,)} {while ((length 0) !($0 in already)) {print /$0; already[$0] = 1;
Re: specifying a list of files to transfer
On Tue, Jan 14, 2003 at 03:32:41PM -0600, Dave Dykstra wrote: I haven't looked at the implementation, but comments on the user interface: 1. Yes it should take a filename or - as a parameter. 2. I don't like the idea of skipping the SRC spec. Paths should be relative to the SRC. If somebody wants to use full paths they can always have a SRC of /. 3. It should be called --files-from. 4. --send-dirs and --no-implicit-dirs shouldn't be separate options, they should be automatically turned on with the --files-from option. Those comments all sound reasonable to me. The only reason I broke out the --send-dirs and --no-implicit-dirs options was because they were orthogonal to what I was doing and could potentially also apply to situations where the user was specifing various SRC filenames on the command line. But it's certainly fine to have --files-from turn those on automatically. -Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
rsync hanging with openssh-2.9.9p2 as the transport
Hi, This is perhaps a stupid question. I apologize in advance if it's already been covered, but I'm stumped... I'm using rsync to backup some file systems to a remote host. The transport is openssh-2.9.9p2. The problem does not occur when the transport is rsh. I'm invoking rsync as follows: rsync -RlHptgoD --numeric-ids -e ssh -rzx --stats . remotehost:/nfs/mirror But it hangs when there are large numbers of files in the . filesystem. If I run with -vvv, I see lots of make_file entries as send_file_list starts sending the list of files to the remote host, but then it freezes after around 4500 to 6500 make_file messages. I see the same problem with versions 2.4.6, 2.5.5, and 2.5.6pre1. And I see it on linux 2.4.18 (red hat 8.0) and solaris 8/x86. I can get it to work by breaking it up into chunks of 1000 files or so. It might work with more, I haven't tested exhaustively. I have tried using the --blocking-io and --no-blocking-io options, but neither one solves the problem. I could provide more info about where it hangs, but I thought somebody might know the answer, since this is clearly related to the interaction with openssh (since there's no problem with rsh). Is there a trick to make rsync work nicely with openssh? I searched the archives and haven't found anything... Does upgrading to openssh 3 solve the problem? Thanks, Andy -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: patch to enable faster mirroring of large filesystems
On Sun, Nov 25, 2001 at 03:21:51AM +1100, Martin Pool wrote: On 20 Nov 2001, Dave Dykstra [EMAIL PROTECTED] wrote: And, by the way, even if the batch stuff accomplishes the same performance gains, I would still argue that the --files-from type of behavior that I implemented is a nice transparent interface that people might like to have. The ability to pipe in output from gfind -print0 opens up some possibilities. Yes, many people have argued that a --files-from type of option is desirable and I agree. I agree too. I think there would certainly be no argument with taking a patch that did only that. (Except that it makes the option list even more ridiculously bloated.) I think a better fix is to transfer directories one by one, pipelined, rather than walking the whole tree upfront. This will take a protocol change and a fair amount of work, but I think it is possible. If we can get it to just work faster transparently that is better. I understand your point of view, but I think it is a mistake to hold rsync's algorithm hostage to the directory tree traversal logic built into the program. IMHO, the basic file transfer algorithm of rsync is terrific, but the program wrapped around it is a bit out of control. The spirit of my patch is to expose the low-level rsync algorithm and to allow people to build up their customized infrastructure outside of the program instead of having to build it in. I think this is in the spirit of Unix tools. I think if rsync were to expose some of its low-level capabilities, then we would not have a need for xdelta and rdiff, projects which are springing up because of rsync's opaqueness. Anyway, you may not like the way my patch is implemented, but I still argue that it serves a useful purpose, and it gets the job done for me. Cheers, Andy
patch to enable faster mirroring of large filesystems
I have attached a patch that adds 4 options to rsync that have helped me to speed up my mirroring. I hope this is useful to someone else, but I fear that my relative inexperience with rsync has caused me to miss a way to do what I want without having to patch the code. So please let me know if I'm all wet. Here's my story: I have a large filesystem (around 20 gigabytes of data) that I'm mirroring over a T1 link to a backup site. Each night, about 600 megabytes of data needs to be transferred to the backup site. Much of this data has been appended to the end of various existing files, so a tool like rsync that sends partial updates instead of the whole file is appropriate. Normally, one could just use rsync with the --recursive and --delete features to do this. However, this takes a lot more time than necessary, basically because rsync spends a lot of time walking through the directory tree (which contains over 300,000 files). One can speed this up by caching a listing of the directory tree. I maintain an additional state file at the backup site that contains a listing of the state of the tree after the last backup operation. This is essentially equivalent to saving the output of find . -ls in a file. Then, the next night, one generates the updated directory tree for the source file system and does a diff with the directory listing on the backup file system to find out what has changed. This seems to be much faster than using rsync's recursive and delete features. I have my own script and programs to delete any files that have been removed, and then I just need to update the files that have been added or changed. One could use cpio for this, but it's too slow when only partial files have changed. So I added the following options to rsync: --source-list SRC arg will be a (local) file name containing a list of files, or - to read file names from stdin --null used with --source-list to indicate that the file names will be separated by null (zero) bytes instead of linefeed characters; useful with gfind -print0 --send-dirs send directory entries even though not in recursive mode --no-implicit-dirs do not send implicit directories (parents of the file being sent) The --source-list option allows me to supply an explicit list of filenames to transport without using the --recursive feature and without playing around with include and exclude files. I'm not really clear on whether the include and exclude files could have gotten me the same place, but it seems to me that they work hand-in-hand with the --recursive feature that I don't want to use. The --null flag allows me to handle files with embedded linefeeds. This is in the style of gnu find's -print0 operator. The --send-dirs overcomes a problem where rsync refuses to send directories unless it's in recursive mode. One needs this to make sure that even empty directories get mirrored. And the --no-implicit-dirs option turns off the default behavior in which all the parent directories of a file are transmitted before sending the file. That default behavior is very inefficient in my scenario where I am taking the responsibility for sending those directories myself. So, the patch is attached. If you think it's an abomination, please let me know what the better solution is. If you would like some elaboration on how this stuff really works, please let me know. Cheers, Andy --- flist.c.origTue Sep 5 22:46:43 2000 +++ flist.c Fri Nov 9 12:01:56 2001 @@ -30,6 +30,7 @@ extern int cvs_exclude; extern int recurse; +extern int send_dirs; extern int one_file_system; extern int make_backups; @@ -501,8 +502,8 @@ /* we use noexcludes from backup.c */ if (noexcludes) goto skip_excludes; - if (S_ISDIR(st.st_mode) !recurse) { - rprintf(FINFO,skipping directory %s\n,fname); + if (S_ISDIR(st.st_mode) !recurse !send_dirs) { + rprintf(FINFO,make_file: skipping directory %s\n,fname); return NULL; } @@ -689,14 +690,16 @@ } -struct file_list *send_file_list(int f,int argc,char *argv[]) +static struct file_list *send_file_list_proc(int f,char *(*ffunc)(), void *opq) { - int i,l; + int l; STRUCT_STAT st; char *p,*dir,*olddir; char lastpath[MAXPATHLEN]=; struct file_list *flist; int64 start_write; + char *in_fn; + extern int implicit_dirs; if (verbose recurse !am_server f != -1) { rprintf(FINFO,building file list ... ); @@ -711,10 +714,10 @@ io_start_buffering(f); } - for (i=0;iargc;i++) { + while ((in_fn = (*ffunc)(opq)) != NULL) { char *fname = topsrcname; - strlcpy(fname,argv[i],MAXPATHLEN); + strlcpy(fname,in_fn,MAXPATHLEN); l = strlen(fname); if (l != 1 fname[l-1]