Re: news of the rproxy world
In message [EMAIL PROTECTED], Martin Pool writes: I hope this can eventually replace the coding functions in rsync, although at the moment Rusty is going ahead on rsync 3.0 with a much simpler and less flexible library. This is the first time I hear of rsync 3.0 -- could you (or Rusty) comment on the planned features and timeline? Presumably this will include the incremental directory tree creation and transfer that Tridge was talking about way back when? Of maybe file list caching? Thanks, -- Alberto Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: news of the rproxy world
In message [EMAIL PROTECTED], Dan Phoenix writes: where does the file list caching go? Err... nowhere, at the moment. The list that rsync builds in memory containing the file names to be transferred and their signatures is right now built from scratch for every request. For a site which mirrors the same of data to a lot of clients, it makes sense to allow caching of the file list so that this is done just once every so often. If you search the mailing list archives you'll find a few messages about the issue. -- Alberto Date: Thu, 14 Dec 2000 16:42:34 -0500 From: Alberto Accomazzi [EMAIL PROTECTED] To: Martin Pool [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: news of the rproxy world In message [EMAIL PROTECTED], Martin Pool writes: I hope this can eventually replace the coding functions in rsync, although at the moment Rusty is going ahead on rsync 3.0 with a much simpler and less flexible library. This is the first time I hear of rsync 3.0 -- could you (or Rusty) comment on the planned features and timeline? Presumably this will include the incremental directory tree creation and transfer that Tridge was talking about way back when? Of maybe file list caching? Thanks, -- Alberto *** * Alberto Accomazzi mailto:[EMAIL PROTECTED] u NASA Astrophysics Data System http://adsabs.harvard.ed u Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.ed u 60 Garden Street, MS 83, Cambridge, MA 02138 USA *** * Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: are redhat updates rsyncable
In message [EMAIL PROTECTED], Harry Putnam writes: Sorry to reprint this request for information but I guess I want more handholding here. [...] "Michael H. Warfield" [EMAIL PROTECTED] writes: rsync ftp.wtfo.com:: Getting this far ... works as advertised. But beyond that point, how to actually get to the files and collect them? If you use the latest sources from the CVS tree, an rsync on an "rsync URL" lists its contents if the URL ends with a trailing slash: adsone-465: ./rsync-current/rsync rsync://ftp.wtfo.com/ WTFO Mirror FTP Site Please report any problems immediately to [EMAIL PROTECTED]. ftp Complete wtfo FTP Sit rh70RedHat 7.0 complete rh70-isoRedHat 7.0 ISO Images [...] adsone-466: ./rsync-current/rsync rsync://ftp.wtfo.com/rh70/ WTFO Mirror FTP Site Please report any problems immediately to [EMAIL PROTECTED]. drwxr-xr-x4096 2000/11/23 18:58:17 . drwxr-xr-x4096 2000/10/12 09:10:11 .nfs_dontpush [...] And so on. Previous versions of rsync did not handle things as gracefully and would display the "client: nothing to do" message instead. In general, though, I imagine that you'd want to rsync on a whole "module" (i.e. top-level element of the rsync URL), as in: adsone-467: ./rsync-current/rsync -avz rsync://ftp.wtfo.com/rh70 /mirror/ You can get the current version of rsync from: rsync://rsync.samba.org/ftp/unpacked/rsync/ On a related note, it looks to me like running rsync in list mode as shown above causes the daemon to create a recursive listing of all files under the top-level directory, which is completely unnecessary. Martin, you may want to have a look at that. Hope this helps, -- Alberto ******** Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: exclude list and patterns
In message [EMAIL PROTECTED], Dave Dykstra writes: It is possible to build your own complete list of files to copy and give them all to rsync, by building a --include list and doing '--exclude *' at the end. Currently you need to also either have --include '*/' or explicitly list all parent directories above the files you want included or else the exclude '*' will exclude the whole directories. There's been talk of adding a --files-from option which would remove this last restriction, and I even offered to implement it, but I'm still waiting for anybody to give performance measurements (using rsync 2.3.2 which had an include optimization that did something similar if there were no wildcards) to show what the performance impact would be. Dave, I see you've now mentioned a few times what the performance impact of this proposed patch would be, and I can't quite understand what you're getting at. My suggestion of --files-from came from the obvious (at least to me) realization that the current include/exclude mechanism is confusing to many users, and had nothing to do with performance (at least on my mind). I thought (and still think) that it would provide a cleaner interface for performing fine-grained synchronization of part of a filesystem, and as such was a desireable feature. So while I understand the argument of not wanting to clobber rsync with a lot of unnecessary features, I thought this one makes sense regardless of performance or compatibility issues. In fact, I think it makes sense to have it as a separate option as opposed to kludging the equivalent functionality in the include/exclude syntax to avoid the proliferation of confusing options and special cases. Anyway, just wanted to make this point. As I have mentioned, I don't personally *need* this option at the moment, but I think that if enough people wanted to see it in rsync it should be implemented regardless of what the change in performance may be. -- Alberto Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: exclude list and patterns
In message [EMAIL PROTECTED], Dave Dykstra writes: Well the easier syntax only motivates me 90% to personally take the time to implement the option. If somebody can show a performance improvement that will be enough to clinch it for me. My initial motivation for implementing the optimization that was taken out in 2.4.0 was performance (which I hadn't measured), and when Tridge took it out he asked me to show him a performance gain to justify leaving it in I did some measurements then and couldn't pursuade myself. All I'm asking is for somebody to put a little effort into showing a modest performance difference. Well, not to be pedantic here, but how do we measure performance of a feature that isn't available yet? I guess my point is that Tridge's objection to the optimization does not apply here, since this is simply a new option rather than a rewrite of code that works already. And the new option is there to make the program more user-friendly rather than increasing performance. -- Alberto Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: unlimited backup revisions?
In message [EMAIL PROTECTED], "Sean J. Schluntz" writes: That's what I figured. Well, I need it for a project so I guess you all won't mind if I code it and submit a patch ;) How does --revisions=XXX sound. --revisions=0 would be unlimited, any oth er number would be the limiter for the number of revisions. And when it reaches that number, do you want it to delete old revisions, or stop making new revisions? You would delete the old one as you continue rolling down. Perhaps something like --backup=numeric would be a better name. In the long term it might be better to handle this with scripting. I would suggest no reinventing the wheel and doing this the way GNU cp does it: -V, --version-control=WORD override the usual version control The backup suffix is ~, unless set with SIMPLE_BACKUP_SUFFIX. The version control may be set with VERSION_CONTROL, values are: t, numbered make numbered backups nil, existing numbered if numbered backups exist, simple otherwise never, simple always make simple backups Unless there is some overwhelming reason not follow this scheme. -- Alberto ******** Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: Need for an --operating-window feature ???
In message A103903D0DE0D1119EE800805FD69AC402D08B04@XCGNY005, "Allen, John L. " writes: I'm new to rsync, so be gentle... I have need to use rsync, but want to have it operate only off-hours when the network is lightly loaded. I did not see any option for making rsync obey an "operating time window" so that it would basically cease copying data if the time-of-day falls outside a specified window. I thus thought it might be a good idea to have a --operating-window option where you could specify an allowed time of operation by indicating two endpoints, perhaps like this --operating-window 22:00-05:00 where the times are given in HH:MM 24-hour military time. You could obviously extend this to allow for multiple disjoint windows, but I don't think there's much point. I've done something like this using a shell script. Essentially the code goes like this: if in_operating_window ; then echo kill -HUP $$ | at $end_operating_window_time exec rsync "$@" else echo $0 "$@" | at $start_operating_window_time fi as you can see, the script uses at(1) to resubmit itself if it's not running during the operating window, otherwise it sets up an at job that will send the script a SIGHUP (causing the running rsync to exit) at the end of the operating window. -- Alberto ******** Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: Need for an --operating-window feature ???
FYI, rsync has a good cleanup mechanism that kicks in when you send it a SIGHUP. It removes/renames temporary files as appropriate, sends a signal to its child process, and exits. I use this all the time to gracefully stop transfers (see the pseudo-code in my previous message). -- Alberto In message [EMAIL PROTECTED], Sean Berry writes: It can do that in one of two ways: finish the current file, or back out the current file. Finishing the current file may leave it running til the next time rsync runs (assuming it'll run out of cron). Backing out the current file is probably what you want. Would it make more sense (and I don't know whether rsync currently supports this in the way I think of) for rsync to back out the current file and exit gracefully if it received a signal 15? This might be a functionality useful outside of the environment you have in mind. On Tue, 3 Apr 2001, Allen, John L. wrote: Date: Tue, 3 Apr 2001 08:39:17 -0400 From: "Allen, John L." [EMAIL PROTECTED] To: 'Dirk Markwardt' [EMAIL PROTECTED] Cc: "'[EMAIL PROTECTED]'" [EMAIL PROTECTED] Subject: RE: Need for an --operating-window feature ??? A cron job is fine for starting it, but I want it to stop on its own if it finds itself running outside its allowed window. (Obviously this is only really needed when there is a huge amount of data to sync, or when the network is really slow.) John. -Original Message- From: Dirk Markwardt [mailto:[EMAIL PROTECTED]] Sent: Tuesday, April 03, 2001 03:38 To: Allen, John L. Cc: '[EMAIL PROTECTED]' Subject: Re: Need for an --operating-window feature ??? Hello John, AJL to have a --operating-window option where you could specify an AJL allowed time of operation by indicating two endpoints, perhaps like this AJL --operating-window 22:00-05:00 AJL where the times are given in HH:MM 24-hour military time. What about a cron-job ? at 22:00: chmod 755 /usr/bin/rsync at 05:00: chmod 644 /usr/bin/rsync Greetings Dirk -- --- Dirk Markwardt Besselstr. 7 38114 Braunschweig [EMAIL PROTECTED] -- Sean Berry works with many flavors of UNIX, but especially Solaris/SPARC and NetBSD. His hobbies include graphics and raytracing. He drinks coke mostly. His opinions are not necessarily those of his employers. ******** Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: temp files during copy
In message [EMAIL PROTECTED] , Jim Ogi lvie writes: Hi, I know rsync creates temp files in the destination directory and then at some point renames them to the original file name. Therefore the destination directories need to be larger than the source directories. I'm trying to find a way to calculate how much larger the destination directories need to be. How does rsync decide when to rename them? Is it by directory? rsync will transfer files one at a time, so you need to have at least as much disk space as the largest file in a directory being syncronized. If your files are large enough that this becomes a problem, I suggest you use the -T option which makes rsync use a separate temporary directory. -- Alberto Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: unexpected EOF in read_timeout
The unexpected EOF in read_timeout can happen for a variety of reasons, but it typically shows up when you have deadlock (no bytes sent across the wire, which could happen as a result of ssh blocking for instance) or a very large filesystem being syncronized (which could happen if the receiving side is spending a long time creating the file list and a timeout has been specified). Under both circumstances, one of the two rsync processes on the receiving side notices that no IO has happened in the last N seconds, so it sends a signal to the other process and quits. (Now that I think of it, it may be that the sender quits first and the receiver then gives up -- you'll have to go through the code if you want to find out for sure). If the problem that causes this is due to the timeout, the simple solution is to increase it (--timeout option). If it's due to the client or server running out of memory trying to generate the file list, the solution would be to break up the transfer in smaller chunks. If it's due to deadlock, you'll have to find another transport for the connection. -- Alberto In message [EMAIL PROTECTED], Phil Howard writes: Randy Kramer wrote: I'm a novice at rsync, so I can't tell you all the details, but, in general, the unexpected EOF in read_timeout (usually) means that something not so good happened on the server side. On what server I was connecting to, I believe that the preprocessing by the server made some watchdog on the server side decide that the process was dead -- it then killed it, and then I got the error message. I never proved this completely. I posted a while back with this problem and someone answered that the message existed in ssh and not in rsync. I never verified it. But I do know that it started happening when I upgraded to 2.4.6. But I also upgraded ssh around that time, so this was believable. At the time, because I was new to using rsync, I was using the -c option to force a full checksum comparison of the two files (because I thought that the files were not updating because the dates and times matched). I stopped using the -c option and just made sure the dates and times did not match and that cured my problem with the unexpected EOF ... -- I believe because the server (and client) spent less time calculating checksums before starting to exchange data. Unfortunately, I'm not using -c and I do get these problems. The thing is, they occur randomly. I run some mirroing scripts and have coded the scripts to just repeat until a good status comes back, like: while ! rsync ; do echo oops, let's try that again; done -- - | Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ | | [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/ | - Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: use rsync as a diff tool only
In message [EMAIL PROTECTED], Lucy Hou writes: Hi all, I am wondering if I can use rsync as a diff tool only, not really copying files over to destination. I tried -n option, it doesn't seem to have done any content comparsion on the file(s), it merely list the file names. Lucy, check out the rdiff utility, distributed with librsync, which is available at http://rproxy.sourceforge.net/download.html. It does exactly what you describe to local files (i.e. no network transport is built in). -- Alberto Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: rsync+ patch
In message [EMAIL PROTECTED], Martin Pool writes: I'm inclined to apply this: at the very least, it doesn't look like it could damage anything else. Any other opinions? Yes please! I would personally love to see that functionality supported in the stock rsync distributions. Since the patch implements a feature that is periodically requested by users, it seems to me there is a good reason for inclusion. -- Alberto Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: Rsync: Re: patch to enable faster mirroring of large filesystems
filename. I still want to write a --files-from option sometime, and I'm still waiting for somebody who has an application that could use it to do some performance measurements with rsync 2.3.2. I agree that --files-from has value on its own without performance implications, but somebody has to want it badly enough to put it in a little effort if they'd like me to implement it. -- Alberto Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: Rsync: Re: patch to enable faster mirroring of large filesyst ems
It seems to me the new options --read-batch and --write-batch should go a long way towards reducing any time spent in creation of checksums and file lists, so you should definitely give 2.4.7pre4 a try. This is just a guess since I haven't actually used those options myself, but seems worth looking into. BTW, could we please have some real documentation about these options? What's in the man page doesn't come nearly close to telling what is cached and how to make use of it. Some examples of how people are using this option may be illuminating for those of us who don't have the time or inclination to figure it out from the code. -- Alberto In message [EMAIL PROTECTED], Keating, Tim writes: I was at first, but then removed it. The results were still insufficiently fast. Were you using the -c option of rsync? It sounds like you were and it's extremely slow. I knew somebody who once went to extraordinary lengths to avoid the overhead of -c, making a big patch to rsync to cache checksums, when all he had to do was not use -c. Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: Rsync: Re: patch to enable faster mirroring of large filesystems
In message [EMAIL PROTECTED], Dave Dykstra writes: On Thu, Nov 29, 2001 at 11:02:07AM -0500, Alberto Accomazzi wrote: ... These numbers show that reading the filenames this way rather than using the code in place to deal with the include/exclude list cuts the startup time down to 0 (from 1hr). The actual sending of the filenames is down from 2h 15m to 1h 40m. The reason this isn't better is due to the fact that turning buffering on only helps the client, while the server still has to do unbuffered reads because of the way the list is sent across. Are you sure about that? I don't see any unbuffered reads. Actually I'm not sure that the code intends to do unbuffered reads, but that's what's happening for sure from the trussing I've done on the server side. I'm not sure how the buffering should take place since the include/exclude file names are sent over the wire one at a time rather than as a chunk of data, but maybe buffering is done at a higher level. 2.3.2 did have the read_check() hack which was there to prevent SSH pipes from getting stuck, maybe that's what you're seeing. That was taken out in 2.4.0 so maybe that would greatly speed it up. Possible. Another reason why I don't think it's worth spending any more time patching 2.3.2 anyways... As far as I can tell there is no way to get around the buffering without a protocol change or a different approach to sending this list. Given the data above, I think implementing --files-from this way would be the wrong way to go, for a number of reasons: I've been starting to think along those lines too. It should be a protocol change to just send the files and not treat it like excludes. In fact, the file list is normally sent from the sender to the receiver, but if the client is the receiver maybe we could figure out a way to have --files-from only send the list in the other direction. Right. The point is that when Tridge wrote the code he was obviously envisioning a client sending a short exclude list to the server and then the server sending a massive list back to the client. Therefore no optimization nor compression has ever been included to ensure the fast trasfer of the exclude list, so patching things this way goes against the original design of the protocol. So probably the best thing to do is stick the file list right after the exclude list, turning on compression if -z has been selected and bump up the protocol so that we can be backwards compatible. At least that's my take. -- Alberto Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: [path] module options with SSH
The discussion about syntax for remote file specification and the exchange between Martin and Wayne about configure options for rsh make me wonder if we should push some alternative syntax for specifying the transport protocol to be used by rsync. I, for one, always stick to the rsync://host/module syntax when pulling from an rsync server, and have often wished that the same syntax were available when doing a push. I find the URL-style syntax easy to remember and understand, while the :: seems much less intuitive (it actually looks perlish to me because of the way modules are specified in perl). Among other things, I notice that SUN uses the url syntax in its man pages describing NFS (they use nfs://host[:port]/pathname). So what came to mind is to have rsync recognize and use both for push and pull remote specifications of the form: rsync://host/module/file ssh://[username@]host/dir/file rsh://[username@]host/dir/file I'm not crazy about the last two, but thought of them while reading messages about ssh/rsh issues. Hmm... one problem that this wouldn't solve is the use of ssh-over-rsyncd that somebody has proposed, though. Also I'm not sure how I would handle the passing of additional options to the external transport program (what we do now with -e 'shell [OPTIONS]'). Ok, so maybe this is not so hot, but rsync:// is cool, IMHO. -- Alberto In message [EMAIL PROTECTED], Dave Dykstra writes: Am I understanding you correctly when you say ssh and -daemon are not working together when you use the :: syntax or are you saying that they just don't period regardless of : or ::? : syntax uses rsh (or ssh if you use -e ssh) to run another copy of the rsync program on the remote side. :: syntax skips that completely, ignores -e, and instead connects to a daemon separately started to listen on port 873 on the remote host. In the future, when JD Paul's patch is accepted, the expectation will be that if you use :: and -e ssh together it will still use ssh to connect but it will run rsync -daemon interactively so it can honor your rsyncd.conf. Does that make it clear? Because, I do not have RSH, only SSH on my server and it does work for me. I do have to use the SSH Verion 2 as I wasn't able to do it with the version 1 and I use DSA not RSA. That doesn't matter; :: syntax bypasses both RSH and SSH. Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: List of rsync output / error messages
Joseph, check out the header file errcode.h in the rsync distribution. That file and the structure found in log.c map the system exit codes to the error messages you refer to, so the best way to programmatically catch errors is simply to check the exit status returned by rsync. -- Alberto In message [EMAIL PROTECTED], Joseph Annino writes: Is there a good place to get information about the list of all possible output and error messages rsync generates? Or should I just muck around the source code (which I haven't looked at yet) and find them? I am doing something where I would like to parse rsync's output using Perl into a set of data structures. I already have something that works under normal conditions. Eventually I'd like to use that data as part of building a Perl/Tk interface. Of course to parse the output successfully, I need to know all the possibilities as surprises can throw things out of whack. And this is just an idea, but ways to make rsync's output more easily parseable, and more verbose in terms to reporting information that would be useful for say making a progress bar would be nice to discuss. Thanks. -- Joseph Annino Consulting - Perl, PHP, MySQL, Oracle, etc. [EMAIL PROTECTED] - http://www.jannino.com Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA
Re: Incremental Diffs?
In message [EMAIL PROTECTED], Kim Scarborough writes: I'm using it to backup files from one computer to another, and it works exactly as I thought it would, except that it seems to be copying entire files over when they've change rather than the differences. What specifically leads you to that conclusion? I have it set to extra verbose, and I've been watching the files transfer over. When I append 2K to a 100MB text file and re-rsync, it's pretty obvious it's transferring 100MB, not 2K + whatever overhead the diff takes up. I wouldn't be so sure. Add the option --stats to the rsync command line and see what it says. AFAIK those numbers are correct. -- Alberto Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
timeout error in rsync-2.5.5
Dear all, I've been trying to track down a problem with timeouts when pulling data from an rsync daemon and I have now run out of any useful ideas. The problem manifests itself when I try to transfer a large directory tree on a slow client machine. What happens then is that the client rsync process successfully receives the list of files from the server, then begins checking the local directory tree, taking its sweet time. Since I know that the process is quite slow, I invoke rsync with a timeout of 5 hours to avoid dropping the connection. Howerver, after a little over 1 hour (usually 66 minutes or so), the server process simply gives up. I have verified the problem under rsync versions 2.3.2, and 2.4.6 and up (including 2.5.5), testing a few different combinations of client/server versions (althoug the client is always a linux box and the server always a solaris box). It looks to me as if something kicks the server out of the select() call at line 202 of io.c (read_timeout) despite the timeout being correctly set to 18000 seconds. Can anybody think of what the problem may be? See all the details below. Thanks, -- Alberto CLIENT: [ads@ads-pc ~]$ rsync --version rsync version 2.5.5 protocol version 26 Copyright (C) 1996-2002 by Andrew Tridgell and others http://rsync.samba.org/ Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, IPv6, 64-bit system inums, 64-bit internal inums rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. [ads@ads-pc ~]$ rsync -ptv --compress --suffix .old --timeout 18000 -r --delete rsync://adsfore.harvard.edu:1873/text-4097/. /mnt/fwhd0/abstracts/phy/text/ receiving file list ... done rsync: read error: Connection reset by peer rsync error: error in rsync protocol data stream (code 12) at io.c(162) rsync: connection unexpectedly closed (17798963 bytes read so far) rsync error: error in rsync protocol data stream (code 12) at io.c(150) SERVER: adsfore-15: /proj/ads/soft/utils/src/rsync-2.5.5/rsync --version rsync version 2.5.5 protocol version 26 Copyright (C) 1996-2002 by Andrew Tridgell and others http://rsync.samba.org/ Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, no IPv6, 64-bit system inums, 64-bit internal inums rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. from the log file: 2002/04/16 08:52:48 [18996] rsyncd version 2.5.5 starting, listening on port 1873 2002/04/16 09:39:01 [988] rsync on text-4097/. from ads-pc (131.142.43.117) 2002/04/16 10:51:36 [988] rsync: read error: Connection timed out 2002/04/16 10:51:36 [988] rsync error: error in rsync protocol data stream (code 12) at io.c(162) from a truss: adsfore-14: truss -d -p 988 Base time stamp: 1018964639.2848 [ Tue Apr 16 09:43:59 EDT 2002 ] poll(0xFFBE4E90, 1, 1800) (sleeping...) 4057.4093 poll(0xFFBE4E90, 1, 1800) = 1 4057.4098 read(3, 0xFFBE5500, 4) Err#145 ETIMEDOUT 4057.4103 time() = 1018968696 4057.4106 getpid()= 988 [18996] 4057.4229 write(4, 2 0 0 2 / 0 4 / 1 6 1.., 66) = 66 4057.4345 sigaction(SIGUSR1, 0xFFBE4D20, 0xFFBE4DA0) = 0 4057.4347 sigaction(SIGUSR2, 0xFFBE4D20, 0xFFBE4DA0) = 0 4057.4349 time() = 1018968696 4057.4350 getpid()= 988 [18996] 4057.4352 write(4, 2 0 0 2 / 0 4 / 1 6 1.., 98) = 98 4057.4357 llseek(0, 0, SEEK_CUR) = 0 4057.4359 _exit(12) Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: timeout error in rsync-2.5.5
Dave, I understand how the timeout works. The problem here is that the traversing of the directory tree on the client side does indeed take more than 1 hour, during which no bytes are exchanged on the wire between client and server. So I do know for a fact that the read call on the server side needs a very long timeout because the client has to traverse a directory tree of 1.5 million files stored on a slow external drive. The question is why does the server process give up after 66 minutes or so even though the timeout has been set to 5 hourse (see the system call). The client machine is now behind a firewall which I guess complicates things a bit, but from what I can tell there is nothing on the LAN that forces the connection to be dropped; I have checked the firewall settings to no avail. Anyone other ideas? -- Alberto P.S. Sorry, but neither one of the machines can be accessed from the outside. In message [EMAIL PROTECTED], Dave Dykstra writes: You shouldn't need to have such a long timeout. The timeout is not over the whole length of the run, only the time since the last data was transferred. It's a mystery to me why it quits after 66 minutes rather than 5 hours, but the real question is why it stops transferring data for so long. Perhaps something went wrong with the network. I can't connect to that server to try it, perhaps it is behind a firewall. - Dave Dykstra On Tue, Apr 16, 2002 at 12:36:03PM -0400, Alberto Accomazzi wrote: Dear all, I've been trying to track down a problem with timeouts when pulling data from an rsync daemon and I have now run out of any useful ideas. The problem manifests itself when I try to transfer a large directory tree on a slow client machine. What happens then is that the client rsync process successfully receives the list of files from the server, then begins checking the local directory tree, taking its sweet time. Since I know that the process is quite slow, I invoke rsync with a timeout of 5 hours to avoid dropping the connection. Howerver, after a little over 1 hour (usually 66 minutes or so), the server process simply gives up. I have verified the problem under rsync versions 2.3.2, and 2.4.6 and up (including 2.5.5), testing a few different combinations of client/server versions (althoug the client is always a linux box and the server always a solaris box). It looks to me as if something kicks the server out of the select() call at line 202 of io.c (read_timeout) despite the timeout being correctly set to 18000 seconds. Can anybody think of what the problem may be? See all the details below. Thanks, -- Alberto CLIENT: [ads@ads-pc ~]$ rsync --version rsync version 2.5.5 protocol version 26 Copyright (C) 1996-2002 by Andrew Tridgell and others http://rsync.samba.org/ Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, IPv6, 64-bit system inums, 64-bit internal inums rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. [ads@ads-pc ~]$ rsync -ptv --compress --suffix .old --timeout 18000 -r --de lete rsync://adsfore.harvard.edu:1873/text-4097/. /mnt/fwhd0/abstracts/phy/text / receiving file list ... done rsync: read error: Connection reset by peer rsync error: error in rsync protocol data stream (code 12) at io.c(162) rsync: connection unexpectedly closed (17798963 bytes read so far) rsync error: error in rsync protocol data stream (code 12) at io.c(150) SERVER: adsfore-15: /proj/ads/soft/utils/src/rsync-2.5.5/rsync --version rsync version 2.5.5 protocol version 26 Copyright (C) 1996-2002 by Andrew Tridgell and others http://rsync.samba.org/ Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, no IPv6, 64-bit system inums, 64-bit internal inums rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. See the GNU General Public Licence for details. from the log file: 2002/04/16 08:52:48 [18996] rsyncd version 2.5.5 starting, listening on por t 1873 2002/04/16 09:39:01 [988] rsync on text-4097/. from ads-pc (131.142.43.117) 2002/04/16 10:51:36 [988] rsync: read error: Connection timed out 2002/04/16 10:51:36 [988] rsync error: error in rsync protocol data stream (code 12) at io.c(162) from a truss: adsfore-14: truss -d -p 988 Base time stamp: 1018964639.2848 [ Tue Apr 16 09:43:59 EDT 2002 ] poll(0xFFBE4E90, 1, 1800) (sleeping...) 4057.4093 poll(0xFFBE4E90, 1, 1800) = 1 4057.4098 read(3, 0xFFBE5500, 4) Err#145 ETI MEDOUT 4057.4103 time() = 101896869 6 4057.4106
rsyncd listing of directories
I just took a look at the 2.5.5 codebase to see how easy it would be to write a little driver script that downloads a big directory tree from an rsync daemon the chunky way (get a list of a module's subdirectories and do the transfer by subdirectory). The reason for doing this is obvious when you have large directory trees, as is the case for many of us. Unfortunately the way list_only is currently implemented makes the whole idea useless since it forces the daemon to recurse the target directory tree anyways. Here's the code: in options.c: /* this is a complete hack - blame Rusty this is a hack to make the list_only (remote file list) more useful */ if (list_only !recurse) argstr[x++] = 'r'; in exclude.c: /* This is a complete hack - blame Rusty. * * FIXME: This pattern shows up in the output of * report_exclude_result(), which is not ideal. */ if (list_only !recurse) { add_exclude(/*/*, 0); } So I'm going to bite and blame Rusty at this point and ask the question: why was this implemented this way? I can't think of a good reason why. I'm happy to try and work on a patch if there is a consensus that this is desireable. -- Alberto Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Rsync'ing lists of files
Just so that we don't forget the lessons from the past, let me point out that we had discussion and testing done on this subject back in November, with mixed results (i.e. YMMV): http://lists.samba.org/pipermail/rsync/2001-November/005398.html I think the consensus from that experiment was that implementing the option using the include/exclude mechanism was not the way to go (correct me if I'm wrong Dave). Andrew Schorr's patch does this differently but from what I can tell it would only work when uploading files to a server (which is the opposite of what my experiments with --files-from were): http://lists.samba.org/pipermail/rsync/2001-November/005272.html Since it seems that different people want this option for different purposes, we need to make sure that it gets implemented in a sensible way, with some testing being done to ensure that we still have decent performance and that it works in all cases (sending/receiving files). So I think our strategy should be to bug Dave Dykstra until he gives up and writes the patch :-) -- Alberto Stephane Paltani wrote: Dave Dykstra wrote: Sigh, another request for the --files-from I promised to write over 6 months ago, but I've been so overloaded at work lately that I don't know if I'm ever going to get to it. Perhaps someone else will have to do it. He he, happy to see a general consensus for this feature! It turns out that back in rsync 2.3.2 and earlier there was an optimization (which I wrote and actually was the primary reason that I volunteered to be maintainer of rsync for a while) that kicked in when there was list of includes with no wildcards followed by an --exclude '*', and there was no --delete. Instead of recursing through the files and doing comparisons, it would just directly open the files in the include list. It only had to be on the sending side, you might want to try 2.3.2 on your sending side to see if you get a significant performance boost. Andrew Tridgell took it out in 2.4.0 because he didn't like how it changed the usual semantics of requiring all parent directories to be explicitly listed in the include list. Whoops! That did the trick for me! It took me 6 minutes to transfer 250 GB! Too bad it has been turned down. I have the impression it would satisfy most, if not all, --files-from lobbyists. Alberto Accomazzi mailto:[EMAIL PROTECTED] NASA Astrophysics Data System http://adsabs.harvard.edu Harvard-Smithsonian Center for Astrophysicshttp://cfawww.harvard.edu 60 Garden Street, MS 83, Cambridge, MA 02138 USA -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
rsync via floppy
Our project is considering supporting a mirror site which is going to be off the network (essentially a stand-alone mirror for a local LAN in a place without internet connectivity). So I am in the (unfortunate) position of having to decide how to do this (or if this can be done at all). The current plan is to set up a PC running linux with a 120GB drive and a DVD reader on the remote site and ship periodical updates to our dataset that can be used to patch the local distribution, then run some updating procedures to make the new database live. I can think of a well-defined plan to carry out the updates, but I'm weary about the lack of feedback about the actual updating procedures (what if a filesystem fills up or a command fails for whatever reason?). I also don't have a lot of time to build a customized system for doing this rsync on a floppy myself, so I'm hoping that somebody on the list has some suggestions or tool that can be useful. BTW, I think that given the nature of our dataset file patching a la rsync is not strictly necessary, since we can probably fit a fresh copy of all files that have changed on a DVD. The problem I'm mostly worried about is keeping enough metadata on both ends to reliably figure out the updating strategy. Thanks, -- Alberto Alberto Accomazzihttp://cfa-www.harvard.edu/~alberto NASA Astrophysics Data System http://adswww.harvard.edu Harvard-Smithsonian Center for Astrophysics [EMAIL PROTECTED] 60 Garden Street, MS 83, Cambridge, MA 02138 USA -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync 2.6.0 - suspected memory leak bug
On Wed, Jan 21, 2004 at 03:35:37PM +, Kelly Garrett wrote: Does anyone know how to build a version of the kernel that either does no disk cacheing (we have very fast RAID processors and SCSI disks on the machine) or limit the amount of cache that the system will allocate for disk? Kelly, we have a similar setup here (RH8 with latest bigmem kernel on a machine with 4GB ram and 1.4TB fs) and see a similar behaviour. Howevever, long ago I have accepted this as a feature of the linux kernel and I've yet to find that this causes any performance issues. It's true that if your file access is truly random caching the filesystem in RAM doesn't help, but I can't imagine that this is a significant performance hit. The system simply releases the cache as needed so that even if at any time you're seeing 100% memory usage, when a new process needs memory the RAM cache will give way. I'm sure that there are ways to override this behaviour (see for instance the linux kernel hacking howto for hints) but I doubt that this is worth the effort unless you need to squeeze every last bit of performance out of your box. And if you do, I would suggest looking at installing a 2.6.x kernel instead. -- Alberto P.S. I found that in our case one thing that actually helped quite a bit with overall performance was tuning some kernel parameters for the particular raid controller we have (3ware IDE raid). I mention this because if you start going down the performance tuning path there are a number of things that you should look at. ** Alberto Accomazzi, NASA Astrophysics Data Systemhttp://ads.harvard.edu Harvard-Smithsonian Center for Astrophysics http://cfa-www.harvard.edu 60 Garden St, MS 31, Cambridge, MA 02138, USA [EMAIL PROTECTED] ** -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
--link-dest not working with rsync daemon?
I am puzzled as to why I can't get the option --link-dest to work properly. When I use this option when both source and destinations are on a local filesystem the hard-linking of the target against the link-dest directory does work, but when the source is a remote directory (via ssh or rsync server) hard links are not created. I suspect it has something to do with setting the correct timestamp on the files, since the server and client machines have clocks with a large offset, but why would that be the case? The version of rsync I'm using is: rsync version 2.6.0 protocol version 27 Copyright (C) 1996-2004 by Andrew Tridgell and others http://rsync.samba.org/ Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles, IPv6, 64-bit system inums, 64-bit internal inums The distribution is fedora core 1 with kernel 2.4.22-1.2140.nptl. Below you can see the results of my test. Any help is appreciated. Thanks, -- Alberto [EMAIL PROTECTED] ~/rsynctest]$ rsync rsync://adswon.cfa.harvard.edu/ast/load/test/ drwxr-xr-x4096 2004/01/26 15:17:47 . -rw-r--r-- 4 2004/01/26 15:17:47 foo [EMAIL PROTECTED] ~/rsynctest]$ date Mon Jan 26 14:59:16 EST 2004 [EMAIL PROTECTED] ~/rsynctest]$ cat ./runtest.sh #!/bin/sh /bin/rm -rf orig new # echo getting original copy $HOME/mirror/bin/i486-linux/rsync-2.6.0 --timeout 1800 --delete -az \ rsync://adswon.cfa.harvard.edu/ast/load/test/ \ `pwd`/orig/ # echo getting second copy $HOME/mirror/bin/i486-linux/rsync-2.6.0 --timeout 1800 --delete -az $@ \ --link-dest=`pwd`/orig/ \ rsync://adswon.cfa.harvard.edu/ast/load/test/ \ `pwd`/new/ /bin/ls -l `pwd`/orig `pwd`/new [EMAIL PROTECTED] ~/rsynctest]$ ./runtest.sh -vvv opening tcp connection to adswon.cfa.harvard.edu port 873 receiving file list ... recv_file_name(.) recv_file_name(foo) received 2 names done recv_file_list done get_local_name count=2 /home/ads/rsynctest/new/ created directory /home/ads/rsynctest/new make_file(.,*,2) expand file_list to 4000 bytes, did move send_file_list done deleting in . recv_files(2) starting generator starting pid=1001 count=2 delta transmission enabled recv_generator(.,0) set modtime of . to (1075148267) Mon Jan 26 15:17:47 2004 ./ recv_generator(foo,1) generating and sending sums for 1 count=1 rem=4 blength=700 s2length=2 flength=4 generate_files phase=1 recv_files(foo) recv mapped foo of size 4 foo got file_sum renaming .foo.cJC1Ra to foo set modtime of foo to (1075148267) Mon Jan 26 15:17:47 2004 recv_files phase=1 generate_files phase=2 recv_generator(.,0) set modtime of . to (1075148267) Mon Jan 26 15:17:47 2004 recv_files finished wrote 127 bytes read 139 bytes 532.00 bytes/sec total size is 4 speedup is 0.02 _exit_cleanup(code=0, file=main.c, line=1064): about to call exit(0) /home/ads/rsynctest/new: total 4 -rw-r--r--1 ads ads 4 Jan 26 2004 foo /home/ads/rsynctest/orig: total 4 -rw-r--r--1 ads ads 4 Jan 26 2004 foo ** Alberto Accomazzi, NASA Astrophysics Data Systemhttp://ads.harvard.edu Harvard-Smithsonian Center for Astrophysics http://cfa-www.harvard.edu 60 Garden St, MS 31, Cambridge, MA 02138, USA [EMAIL PROTECTED] ** -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: --link-dest not working with rsync daemon?
Wayne Davison wrote: On Mon, Jan 26, 2004 at 04:14:14PM -0500, Alberto Accomazzi wrote: I am puzzled as to why I can't get the option --link-dest to work properly. When I use this option when both source and destinations are on a local filesystem the hard-linking of the target against the link-dest directory does work, but when the source is a remote directory (via ssh or rsync server) hard links are not created. This is something that is fixed in the CVS version. You can work around the problem in 2.6.0 by not specifying (or implying) the -o (--owner) option (when running as non-root). So, change your -a option into the options -rlptgD, and it should work fine. Wayne, indeed you're correct on this one. Guess I should have tried CVS before starting to whine ;-) Let me take this opportunity to thank you personally for taking on the task of pushing out the latest rsync release and for your and jw's continuing work on this. I know a lot of people have contributed patches and ideas to rsync but it does take a few good men to pull it all together. I think you guys are doing a top-notch job. So I think I speak for all the readers on the list when I say we are all *very* grateful for all you've done and are doing. And while I'm still on the line: do you have any ideas about upcoming releases? There was some discussion prior to releasing 2.6.0 about what should be tackled next, but I'm not sure there was consensus beyond what became 2.6.0. Thanks, -- Alberto -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: --link-dest not working with rsync daemon?
Wayne Davison wrote: I was just mentioning to J.W. that I thought we had so much good stuff already in CVS that we should try to button things up in the next month or so, really hammer on it a while, and then crank out a new release. There have been a lot of optimizations that make it use less CPU and less memory, which I like a lot. I do think the changes have been pretty significant, so we obviously need to do some testing to make sure that we haven't busted something important. However, it's looking really stable to me so far -- I run it for all my rsyncing. Wayne, ok, I'm on the hammering CVS bandwagon, so I just downloaded and compiled the latest nightly snapshot on a redhat 7.3 box and I have a small problem to report. I configured rsync using ./configure --with-included-popt and got a bunch of warnings (see below). Is this worth worrying about? I'm not sure how many people use the built-in popt these days... -- Alberto [EMAIL PROTECTED] rsync-HEAD-20040127-1010GMT]$ uname -a Linux adsfife 2.4.20-24.7smp #1 SMP Mon Dec 1 13:03:45 EST 2003 i686 unknown [EMAIL PROTECTED] rsync-HEAD-20040127-1010GMT]$ gcc --version 2.96 [EMAIL PROTECTED] rsync-HEAD-20040127-1010GMT]$ make [...] gcc -I. -I. -g -O2 -DHAVE_CONFIG_H -Wall -W -I./popt -c popt/popt.c -o popt/popt.o popt/popt.c: In function `poptAddAlias': popt/popt.c:1058: warning: unused parameter `flags' gcc -I. -I. -g -O2 -DHAVE_CONFIG_H -Wall -W -I./popt -c popt/poptconfig.c -o popt/poptconfig.o popt/poptconfig.c: In function `poptReadDefaultConfig': popt/poptconfig.c:162: warning: unused parameter `useEnv' gcc -I. -I. -g -O2 -DHAVE_CONFIG_H -Wall -W -I./popt -c popt/popthelp.c -o popt/popthelp.o popt/popthelp.c: In function `displayArgs': popt/popthelp.c:20: warning: unused parameter `foo' popt/popthelp.c:22: warning: unused parameter `arg' popt/popthelp.c:22: warning: unused parameter `data' popt/popthelp.c: In function `getArgDescrip': popt/popthelp.c:87: warning: unused parameter `translation_domain' popt/popthelp.c: In function `singleOptionDefaultValue': popt/popthelp.c:118: warning: unused parameter `translation_domain' popt/popthelp.c: In function `poptPrintHelp': popt/popthelp.c:478: warning: unused parameter `flags' popt/popthelp.c: In function `poptPrintUsage': popt/popthelp.c:637: warning: unused parameter `flags' ** Alberto Accomazzi, NASA Astrophysics Data Systemhttp://ads.harvard.edu Harvard-Smithsonian Center for Astrophysics http://cfa-www.harvard.edu 60 Garden St, MS 31, Cambridge, MA 02138, USA [EMAIL PROTECTED] ** -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
file has vanished bug [rsync-HEAD-20040127-1010GMT]
Just ran into this bug when running the latest snapshot from CVS: when rsyncing from two source directories into a third one, rsync gets confused about which source file is from which directory, resulting in a file vanished error. See test script below. Also, is there any consensus on whether using multiple source modules when pulling from an rsync daemon is going to be ok? I recall some discussion on escaping spaces or quoting them in the past but I'm not sure if anything was decided. What I'm referring to is this case: rsync -av rsync://server/'module1 module2 module3' dest/ Right now the latest CVS still supports this. Thanks, -- Alberto - #!/bin/sh [ -d target ] /bin/rm -rf target if [ ! -d one ] ; then mkdir one touch one/foo touch one/zoo fi if [ ! -d two ] ; then mkdir two touch two/bar fi ./rsync-2.6.1 -avv one/ two/ target/ /bin/ls -l one two target [EMAIL PROTECTED] ~/tmp]$ ./runtest.sh building file list ... done created directory target ./ bar file has vanished: /home/ads/tmp/two/foo file has vanished: /home/ads/tmp/two/zoo wrote 150 bytes read 80 bytes 460.00 bytes/sec total size is 0 speedup is 0.00 rsync warning: some files vanished before they could be transfered (code 24) at main.c(628) one: total 0 -rw-rw-r--1 ads ads 0 Jan 27 16:22 foo -rw-rw-r--1 ads ads 0 Jan 27 16:22 zoo target: total 0 -rw-rw-r--1 ads ads 0 Jan 27 16:22 bar two: total 0 -rw-rw-r--1 ads ads 0 Jan 27 16:22 bar -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Change in reporting for --dry-run in 2.6.x
I just noticed that there is an extra blank line in the output generated by rsync when the --dry-run (-n) flag is used. This seems to have started with 2.6.0. Is this desired? The reason why I'm asking is because I use scripts that parse the output from rsync and little modifications in verbosity can make or break things easily. Thanks, -- Alberto [EMAIL PROTECTED] bin]$ rsync-2.5.7 -an rsync://adswon.cfa.harvard.edu/pre/load/current/ /home/ads/abstracts/pre/latest/ receiving file list ... done wrote 78 bytes read 1204 bytes 2564.00 bytes/sec total size is 457622413 speedup is 356959.76 [EMAIL PROTECTED] bin]$ rsync-2.6.1 -an rsync://adswon.cfa.harvard.edu/pre/load/current/ /home/ads/abstracts/pre/latest/ receiving file list ... done wrote 78 bytes read 1204 bytes 2564.00 bytes/sec total size is 457622413 speedup is 356959.76 -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]
Chris, to put things in the right prespective, you should read (if you haven't done so already) the original paper describing the design behind batch mode. The design and implementation of this functionality goes back to a project called the Internet2 Distributed Storage Infrastructure (I2-DSI). As part of that project, the authors created a modified version of rsync (called rsync+) which had the capability of creating these batch sets for mirroring. Here are a couple of URLs describing the ideas and motivation behind it: http://www.ils.unc.edu/i2dsi/unc_rsync+.html http://www.ils.unc.edu/ils/research/reports/TR-1999-01.pdf Chris Shoemaker wrote: Yes, I think you're right about the original design. And I guess we'd want to preserve that capability. Or would we? I'm having a little trouble seeing why this was the intended use. I figure, there are three cases: A) If you have access to both source and dest, it doesn't really matter too much who writes the batch -- this is like the local copy case. B) If you have access to the dest but not the source, then you need the client to write the batch -- and it's not far-fetched that you might have other copies of dest to update. C) However, having access to source but not dest is the only case that _requires_ the sender to write the batch -- now what's the chance that you'll have another identical dest to apply the batch to? And if you did, why wouldn't you generate the batch on that dest as in case A, above? So, it seems to me that it's much more useful to have the receiver/client write the batch than sender/client, or receiver/server, or sender/server. But, maybe I'm just not appreciating what the potential uses of batch-mode are. Survey: so who uses batch-mode and what for? I haven't used the feature but back when I read the docs on rsync+ I thought it was a clever way to do multicasting on the cheap. I think the only scenario where batch mode makes sense is when you need to distribute updates from a particular archive to a (large) number of mirror sites and you have tight control on the state of both client and server (so that you know exactly what needs to be updated on the mirror sites). This ensures that you can create a set of batch files that contain *all* the changes necessary for updating each mirror site. So basically I would use batch mode if I had a situation in which: 1) all mirror sites have the same set of files 2) rsync is invoked from each mirror site in exactly the same way (i.e. same command-line options) to pull data from a master server then instead of having N sites invoke rsync against the same archive, I would invoke it once, make it write out a set of batch files, then transfer the batch files to each client and run rsync locally using the batch set. The advantage of this is that the server only performs its computations once. An example of this usage would be using rsync to upgrade a linux distribution, say going from FC 1 to FC 2. All files from each distribution are frozen, so you should be able to create a single batch which incorporates all the changes and then apply that on each site carrying the distro. The question of whether the batch files should be on the client or server side is not easy to answer and in the end depends on exactly what you're trying to do. In general, I would say that since the contents of the batch mode depend on the status of both client and server, there is not a natural location for it. -- Alberto Alberto Accomazzi aaccomazzi(at)cfa harvard edu NASA Astrophysics Data Systemads.harvard.edu Harvard-Smithsonian Center for Astrophysics www.cfa.harvard.edu 60 Garden St, MS 31, Cambridge, MA 02138, USA -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]
Chris Shoemaker wrote: Indeed, what you describe seems to have been the design motivation. I can share what my desired application is: I want to create a mirror of a public server onto my local machine which physically disconnected from the Internet, and keep it current. So, I intend to first rsync update my own copy which _is_ networked while creating the batch set. Then I can sneakernet the batch set to the unnetworked machine and use rsync --read-batch to update it. This keeps the batch sets smallish even though the mirror is largish. This was something I looked into a couple of years ago. Back then I even posted an email to the list (http://lists.samba.org/archive/rsync/2002-August/003433.html) and got no feedback, which led me to conclude that people were not doing any of this at the time. To restate the obvious, the batch mode thing is really just a glorified diff/patch operation. The problem I have with it is that AFAICT it's a very fragile one, since a simple change of one file on either sender or receiver after the batch has been created will invalidate the use of the batch mode. Contrast this with diff/patch, which has builtin measures to account for fuzzy matches and therefore makes it a much more robust tool. In the end my motivation for using the rsync-via-sneakernet approach disappeared when I convinced myself that the whole operation would have been far too unreliable, at least for our application where files are updated all the time and there is never really a freeze of a release against which a batch file can be created. I won't go as far as saying that the feature is useless, but just caution people that they need to understand the assumptions that this use of rsync is based upon. Also, I would suggest checking out other diff/patch tools such as rdiff-backup or xdelta. BTW, there is a work-around. If you don't mind duplicating the mirror twice, one solution is to do a regular (no --write-batch) rsync update of one copy of the mirror, and then do the --write-batch during a local to local rsync update of another copy of the mirror. Actually, this has some real advantages if your network connection is unreliable. This is really the only circumstance under which I would even consider using batch mode. There should also be safeguards built into the batch mode operation to guarantee that the source files to which the batch is applied are in the state we expect them to be. I wouldn't otherwise want rsync to touch my files. Thanks for your input. Likewise. Good luck... -- Alberto Alberto Accomazzi aaccomazzi(at)cfa harvard edu NASA Astrophysics Data Systemads.harvard.edu Harvard-Smithsonian Center for Astrophysics www.cfa.harvard.edu 60 Garden St, MS 31, Cambridge, MA 02138, USA -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]
Wayne Davison wrote: The knowledge or memory of that exact state is more likely to reside with the receiver (who just left that state) than with the sender (who may never have been in that state). Therefore it is more likely to be useful to the receiver than to sender. This is only true if you imagine a receiver doing one pull and then forwarding the update on to multiple hosts. For instance, if you use a pull to create the batch files and then make them available for people to download, which would help to alleviate load from the original server. That said, I think most of the time a receiver is going to be a leaf node, so the server tends to be the place where a batch is more likely to be useful, IMO. In thinking about batch mode, it seems like its restrictions make it useful in only a very small set of of circumstances. Since the receiving systems must all have identical starting hierarchies, it really does limit how often it can be used. I completely agree with Wayne's assesment here. But just to make things clear, let's restate what batch mode provides: 1. a (partial) set of metadata about the state of the sender 2. a (partial) set of metadata about the state of the receiver 3. an rsync-style patch for files that differ in 1. and 2. so while 1+2+3 may be too restrictive to be useful in mirroring datasets, having the capability to create and cache just 1 or 2 may be a big win for busy servers. I'm wondering if batch mode should be removed from the main rsync release and relegated to a parallel project? It seems to me that a better feature for the mainstream utility would be something that optimized away some of the load on the sending system when it is serving lots of users. So, having the ability to cache a directory tree's information, and the ability to cache checksums for files would be useful (especially if the data was auto-updated as it became stale). That would make all transfers more optimal, regardless of what files the receiving system started from. Firs of all, I have a feeling that the number of people who have *considered* using batch mode is quite small, and those who actually have used in the recent past is certainly an even smaller number (I'm thinking zero, actually). So removing the functionality from the mainstream rsync would not be a problem, in fact I think it would be a good thing. It doesn't make sense to keep something in the code that is not used and cannot be reliably supported. Although I applaud Jos's efforts in providing this functionality to rsync, I was surprised to see it included in the main distribution, especially since it underwent virtually no testing as far as I can tell. There's no doubt that caching the file list on the server side would indeed be a very useful feature for all those who use rsyncd as a distribution method. We all know how difficult it can be to reliably rsync a large directory tree because of the memory and I/O costs in keeping a huge filelist in memory. This may best be done by creating a separate helper application (say rsyncd-cache or such) that can be run on a regular basis to create a cached version of a directory tree corresponding to an rsyncd module on the server side. The trick in getting this right will be to separate out the client-supplied options concering file selection, checksumming, etc, so that the cache is as general as possible and can be used for a large set of connections so as to minimize the number of times that the actual filesystem is scanned. Such a new feature would probably best be added to an rsync replacement project, though. Hmmm... replacement? why not make this a utility that can be run alongsize an rsync daemon? Or are you thinking of a design for a new rsync? -- Alberto Alberto Accomazzi aaccomazzi(at)cfa harvard edu NASA Astrophysics Data Systemads.harvard.edu Harvard-Smithsonian Center for Astrophysics www.cfa.harvard.edu 60 Garden St, MS 31, Cambridge, MA 02138, USA -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]
Chris Shoemaker wrote: There's no doubt that caching the file list on the server side would indeed be a very useful feature for all those who use rsyncd as a distribution method. We all know how difficult it can be to reliably rsync a large directory tree because of the memory and I/O costs in keeping a huge filelist in memory. This may best be done by creating a separate helper application (say rsyncd-cache or such) that can be run on a regular basis to create a cached version of a directory tree corresponding to an rsyncd module on the server side. The trick in getting this right will be to separate out the client-supplied options concering file selection, checksumming, etc, so that the cache is as general as possible and can be used for a large set of connections so as to minimize the number of times that the actual filesystem is scanned. What client options are you thinking will be tricky? Wouldn't the helper app just cache _all_ the metadata for the module, and then rsync would query only the subset it needed? It's not like the client can change the checksum stride. [That would hurt.] What I'm referring to are those options that a client passes to the server which influence file selection, checksum and block generation. I haven't looked at the rsync source code in quite a while, but off the top of my head here are the issues to look at when considering caching a filesystem scan: 1. Exclude/include patterns: -C, --cvs-exclude auto ignore files in the same way CVS does --exclude=PATTERN exclude files matching PATTERN --exclude-from=FILE exclude patterns listed in FILE --include=PATTERN don't exclude files matching PATTERN --include-from=FILE don't exclude patterns listed in FILE --files-from=FILE read FILE for list of source-file names These should be easy to deal with: I would simply have the cache creator ignore any --exclude options passed by the client (but probably honor the ones defined in a daemon config file). 2. Other file selection options: -x, --one-file-system don't cross filesystem boundaries -S, --sparsehandle sparse files efficiently -l, --links copy symlinks as symlinks -L, --copy-linkscopy the referent of all symlinks --copy-unsafe-links copy the referent of unsafe symlinks --safe-linksignore unsafe symlinks It's possible that these can also be dealt with easily, but I'm not so sure. Clearly -x influences what gets scanned, so how do you decide what to cache? The other options are probably easier to deal with. 3. File checksums: -c, --checksum always checksum Should the caching operation always checksum so that the checksums are readily available when a client sets -c? This can lead to a lot of computations and disk IO which may be unnecessary if the clients do not use this option. 4. Block checksums: -B, --block-size=SIZE checksum blocking size (default 700) It would be great if we could cache the rolling block checksums as they are computed but this may be even harder (or impossible) to deal with. And it looks like soon we'll have a new checksum-seed option which will further complicate the issue (in fact I admit I have no idea about how all of this works beyond versions 2.5.x; maybe somebody with more knowledge on the subject will chime in). So I'm just pointing out that in order to create a cache with a high hit probability you have to make assumptions and choices that may be non-trivial. Probably the best solution is reducing the scope of the cache so that it contains only the initial file list generation under default settings, or maybe you want to have a set of different caches created using different options. I, for one, have consistently been using the --checksum option when distributing some sensitive data to our mirror sites, so I would want that to be included in a cache. -- Alberto Alberto Accomazzi aaccomazzi(at)cfa harvard edu NASA Astrophysics Data Systemads.harvard.edu Harvard-Smithsonian Center for Astrophysics www.cfa.harvard.edu 60 Garden St, MS 31, Cambridge, MA 02138, USA -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Potential new option: --delete-during
Wayne, I haven't (yet) given this a try but it sounds like a very reasonable thing to do. --delete-before is still useful in those cases where you may be tight on diskspace. But the default behaviour should be the most efficient one, so I agree with making this patch be the default --delete. On a related note, don't you think it's time to start making candidate releases for 2.6.4? It's been a while... -- Alberto There is a new patch named delete-during.diff in the CVS patches dir. This patch adds the ability for rsync to incrementally delete files in each directory during the normal course of the transfer rather than doing a separate delete scan either before or after the transfer. The patch renames the current --delete option into --delete-before and makes --delete behave in the delete-during style. I'm debating whether we actually need a --delete-during option -- I'm currently leaning towards leaving it out, so it's not documented as existing at the moment. I've done some simple testing (including both with and without the --relative option) and it seems to work fine so far. Comments? How do people feel about making the --delete-during behavior the default --delete algorithm? I think it will be much more efficient (and less prone to timeouts), so having it as the default is the best choice. The patch applies to (and comes with) the CVS version, and is present in the latest nightly tar file (available from the web site). ..wayne.. Alberto Accomazzi aaccomazzi(at)cfa harvard edu NASA Astrophysics Data Systemads.harvard.edu Harvard-Smithsonian Center for Astrophysics www.cfa.harvard.edu 60 Garden St, MS 31, Cambridge, MA 02138, USA -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: feedback on rsync-HEAD-20050125-1221GMT
Hi Chris, Chris Shoemaker wrote: On Fri, Jan 28, 2005 at 03:42:25PM -0500, Alberto Accomazzi wrote: Chris Shoemaker wrote: If I understand Wayne's design, it would be possible to invent a (per-directory) hook rule, whose value is executed, and whose stdout is parsed as a [in|ex]clude file list. E.g.: -R cat .rsync-my-includes or -R find . -ctime 1 -a ! -fstype nfs -a ! -empty -o iname 'foo*' This is certainly a very powerful mechanism, but it definitely should not be the only way we implement file filtering. Two problems: 1. Sprinkling rule files like these across directories would mean executing external programs all the time for each file to be considered. No, only one execution per specified rule. Most users of this feature would put specify one rule at the root directory. But, if a user wanted to change the rules for every directory, they would have to specify a rule in each directory. Then, yes, one execution per directory. Presumably they would do this because they actually need to. Never one execution per file. Ok, I guess I had misunderstood your original suggestion. One execution per directory is presumably not so bad, although it's hard to make assumptions about how one's data hierarchy is structured. This would presumably slow down rsync's execution by an order of magnitude or so and suck the life out of a system doing a big backup job. If you're referring to process spawning overhead, it's no big deal. If you're referring to the actual work required to return the file list, what makes you think that rsync can do it more efficiently than 'cat' or 'find', or whatever tool the user chose? I was referring to the overhead of spawning a process per file being considered. But I think we all agree that this is not desirable nor necessary. 2. Who does actually need such powerful but yet hard-to-handle mechanism? Most of rsync's users are not programmers, and even us few who are apparently still get confused with rsync's include/exclude logic, forget about even more complicated approaches. Do you mean include/exclude mechanism or filtering mechanism? Well, IMO, parsing a file list is *less* complicated than rsync's custom pattern specification and include/exclude chaining. Actually, I think rsync patterns are /crazy/ complicated and fully deserve the pages upon pages of documentation, explanation and examples that they get in the man page. But, complexity is somewhat subjective, so I won't argue (much) about it. In practice, /familiarity/ is far more important than complexity in a case like this. Someone who looks at rsync for the first time has a _zero_ chance of having seen something like rsync's patterns before, because there is nothing else like them. I agree that exclude/include patters can be tricky, and you have a good point about familiarity versus complexity. I think what makes them hard to handle is the fact that we are dealing with filename (and directory name) matching and recursion. So matching only a subset of a file tree, while simple as a concept, is non-trivial once you sit down and realize that you need a well-defined syntax for it. Can you write a find expression that is simpler or more familiar to the average user than an rsync's include/exclude? (The allusion to GNU tar's --exclude option which takes only a filename, not a pattern, isn't really helpful in understanding rsyncs --exclude option.) Uh? Tar does take patters for exclusion, and has its own quirky way of dealing with wildcards, directory matching and filename anchoring: http://www.gnu.org/software/tar/manual/html_node/tar_100.html It's not that pattern matching for file selection isn't complex -- it's just that it's such a well-defined, conceptually simple, common task that other tools (like 'find' and 'bash') handle better than rsync ever will. And that's the way it should be: it's the unix way. I agree that this is something we should be striving for as much as possible: pipeline and offload tasks rather than bloating applications. If you really need complete freedom maybe the way to go is to do your file selection first and use --files-from. Yes, --files-from is nice, and honestly, almost completely sufficient. But in some dynamic cases, you can't keep the list updated. Well, maybe we should go back and see if the solution to all problems isn't making --files-from sufficient. What exactly is missing from it right now? The capability to delete files which are not in the files-from list? Or the remote execution of a command that can generate the files-from list for an rsync server? Maybe we ought to really figure out what things cannot be achieved with the current functionality before coming up with something new. challenge is making this powerful without making it too complicated, because in that case nobody will use it. You see --filter as less complicated than --include/exclude, then? It's certainly more powerful. Since --filter can support a superset