Re: rsync exclude/include
On Tue, 13 Nov 2001, Thomas Schweikle wrote: > I am calling rsync using > > rsync -avz --include-from="include" --exclude-from="exclude" > ftp3.sourceforge.net::/netbsd/iso iso/ Looks like you didn't copy that command exactly, because rsync would fail with a syntax error due to the '/' before the netbsd module name. Also, you're creating an iso dir inside your local iso dir, which is probably not what you want. With the include/exclude file Dave gave you, you'd need to run this command (changing "iso/" into "."): rsync -avz --include-from=foo ftp3.sourceforge.net::netbsd/iso . However, I'd suggest one a little simpler: add a trailing slash to the root directory you're requesting and you can leave off the references to it (and put the data wherever you like, even if the directory isn't named "iso"). You would run this command: rsync -avz --include-from=foo ftp3.sourceforge.net::netbsd/iso/ myiso And put this into "foo": + /1.5.*/ + /1.5.*/i386* - * You'll note I also used a trailing slash for the directory include since I don't want any files that match to be included (there are none here, but it's a good general principle). ..wayne..
Re: rsync copy speed.
On Wed, 10 Oct 2001, Andre Pang wrote: > ssh is your problem I believe Hans said that he only uses ssh to startup the samba-using process going, and then transfers all files "locally" with rsync. So, the problem is that samba is doing all the data transfer over the network instead of rsync. So Hans, if you're updating files on the destination drive (as opposed to copying them whole onto an empty drive), a much better solution would be to startup an rsync server (in read-only mode) on the Win98 machine for the duration of the backup. This allows rsync to optimize the data transfer. If you're copying files into an empty destination drive, you might try using a recursive ftp grab or maybe using something like this: cd /path tar cf - . | gzip | ssh backup '(cd /backup/path; gunzip | tar xpf -)' I know there are tar and gzip utilities available for Win98. ..wayne..
Moving files revisited
I'd like to revisit the topic of moving files from system to system using rsync. I've just updated my patch from its 2.5.0 version to 2.5.1, and I'm curious what people think about getting it integrated into rsync. The patch comes in two parts. The first eliminates a potential hang condition that can happen if the data channel from the receiver to the generator gets clogged up. Since my move-files patch is using this channel to communicate when a file gets successfully written to disk (from the receiver to the sender via the generator), it needs to ensure that this hang cannot happen. The fix is rather complicated (because the generator is doing a lot of reading and writing of other data), but I've been using this patch in production conditions for quite a few months now and haven't encountered any problems yet. Here's the nohang patch: http://www.clari.net/~wayne/rsync-nohang.patch The second part of the equation actually adds the --move-files option, the communication of the receiver back to the sender of which file was successfully finished, and the actual unlinking of the source file: http://www.clari.net/~wayne/rsync-move-files.patch Comments? ..wayne..
Re: Rsync 2.5.2 -v too verbose?
On Wed, 30 Jan 2002, Dave Dykstra wrote: > Martin has put in the below feature in rsync 2.5.2 for using a shell. I've > already had one user complain about it. I think it would be better at the > -vv level. Yes, I agree that -vv would be better. People use -v primarily to see what files are getting transferred, and seeing what behind-the-scenes ssh connection is happening is better reserved for a more verbose output level. ..wayne..
Re: Moving files revisited
On Wed, 23 Jan 2002, Wayne Davison wrote: > I'd like to revisit the topic of moving files from system to system > using rsync. I'm sad that nobody wanted to talk about --move-files yet, but maybe this will help things along. I've adapted the patch files to be based on the latest CVS source: http://www.clari.net/~wayne/rsync-nohang.patch http://www.clari.net/~wayne/rsync-move-files.patch The version for 2.5.1 was renamed: http://www.clari.net/~wayne/rsync-2.5.1-nohang.patch http://www.clari.net/~wayne/rsync-2.5.1-move-files.patch If anyone has any questions, let me know. ..wayne..
Tweak for add_exclude() -vvv output
Here's an improved version of an old patch that I submitted. It improves the -vvv output when using --exclude and --include options: Index: rsync/exclude.c --- rsync/exclude.c 23 Jan 2002 04:57:18 - 1.39 +++ rsync/exclude.c 30 Jan 2002 18:35:46 - @@ -201,9 +201,11 @@ if (!*list || !((*list)[len] = make_exclude(pattern, include))) out_of_memory("add_exclude"); - if (verbose > 2) - rprintf(FINFO,"add_exclude(%s)\n",pattern); - + if (verbose > 2) { + rprintf(FINFO,"add_exclude(%s,%s)\n",pattern, + include ? "include" : "exclude"); + } + (*list)[len+1] = NULL; } The old output is confusing because an include and and exclude generated the same text. This change causes excludes to be output with ",include" and excludes to be output with ",exclude". ..wayne..
configure --with-rsh=CMD and default blocking-IO support
A while back I argued for adding a --with-rsh=CMD option to configure and got some general agreement that it would be a good thing (especially for systems that don't have rsh at all). However, the changes were never integrated into rsync. This patch adds the --with-rsh=CMD option to configure and modifies main.c to improve the blocking-IO setting code. The old code would set blocking_io to '1' if the string matched either "rsh" or "remsh" (whichever one was configured into rsync). The new code has a slightly modified version of this check (that still works even if RSYNC_RSH isn't defined to be "rsh"), but it also adds a way to force the blocking-IO setting (both at configure time and via the RSYNC_RSH environment variable). The idiom I chose to use was to prefix the value with '@' to indicate that blocking-IO should be used, and to prefix it with "@@" to indicate that blocking-IO should not be used. This allows the installer to specify --with-rsh=@@ssh to explicitly specify non-blocking-IO for ssh (for the paranoid), the user to specify RSYNC_RSH=@/local/bin/rsh to get blocking-IO when using a path to rsh (which the old code would force the user to specify the --blocking-io option), and also to be able to specify --with-rsh=@@rsh to get a non-blocking-IO rsh by default (which is impossible with the old code without specifying a path). I've appended the patch to the end. Don't forget to run autoconf after applying it. ..wayne.. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- Index: rsync/config.h.in --- rsync/config.h.in 15 Jan 2002 09:53:29 - 1.68 +++ rsync/config.h.in 30 Jan 2002 18:45:18 - @@ -303,6 +303,9 @@ #undef RETSIGTYPE /* */ +#undef RSYNC_RSH + +/* */ #undef RSYNC_PATH /* rsync release version */ Index: rsync/configure.in --- rsync/configure.in 25 Jan 2002 23:19:21 - 1.130 +++ rsync/configure.in 30 Jan 2002 18:45:19 - @@ -78,6 +78,10 @@ AC_ARG_WITH(included-popt, [ --with-included-poptuse bundled popt library, not from system]) +AC_ARG_WITH(rsh, + [ --with-rsh=CMD set rsh command to CMD (default: \"remsh\" or +\"rsh\")], + [ AC_DEFINE_UNQUOTED(RSYNC_RSH, "$with_rsh", [ ]) ]) + AC_ARG_WITH(rsync-path, [ --with-rsync-path=PATH set default --rsync-path to PATH (default: \"rsync\")], [ RSYNC_PATH="$with_rsync_path" ], Index: rsync/main.c --- rsync/main.c25 Jan 2002 10:07:41 - 1.138 +++ rsync/main.c30 Jan 2002 18:45:22 - @@ -209,8 +209,19 @@ server_options(args,&argc); - - if (strcmp(cmd, RSYNC_RSH) == 0) blocking_io = 1; + if (*cmd == '@') { + if (*++cmd == '@') { + cmd++; + blocking_io = 0; + } else + blocking_io = 1; + args[0] = cmd; + } else if (strcmp(cmd, "rsh") == 0 +#if HAVE_REMSH + || strcmp(cmd, "remsh") == 0 +#endif + ) + blocking_io = 1; } args[argc++] = "."; Index: rsync/rsync.h --- rsync/rsync.h 25 Jan 2002 23:00:21 - 1.121 +++ rsync/rsync.h 30 Jan 2002 18:45:29 - @@ -85,10 +85,12 @@ #include "config.h" +#ifndef RSYNC_RSH #if HAVE_REMSH #define RSYNC_RSH "remsh" #else #define RSYNC_RSH "rsh" +#endif #endif #include ---8<--8<--8<--8<---cut here--->8-->8-->8-->8---
Re: Moving files revisited
On Thu, 31 Jan 2002, Dave Dykstra wrote: > It's up to Martin to decide, but I'm sorry to tell you that I'm opposed to > a --move-files option. I think that if somebody wants to do that they > should do it with an external program after rsync returns a clean exit > code. It seems to me that it goes against the purpose of rsync because > after the files are removed from the sending side there's nothing left to > sync later. I use rsync instead of scp to copy all my files from system to system, even when I'm not going to synchronize anything. The reason is that it does so many things the right way that scp doesn't support (e.g. scp opens a new ssh connection for every file, it has no option to write data to a temp file outside of the destination dir and move it in place when complete, it has no include/exclude options, a non-recursive copy doesn't handle directories as nicely as rsync, etc.). So, I understand where you're coming from, but I look at rsync as a general file-copying tool (that is also very efficient at updating files) rather than just as a tool that keeps files in sync. I know that the line has to be drawn somewhere, though, when deciding how much is too much. > I see that Tridge liked the idea in general but had some problems with your > implementation: > > http://lists.samba.org/pipermail/rsync/2001-May/004282.html > > Have you addressed his concern? That was my earliest patch from back before I understood the data flow between all the rsync modules. It was my work on the move-files option that prompted me to do all the no-hang work, including the patch that is required by the move-files option. You'll find later discussion on the list where Tridge (I believe) also objected to a buffer that could grow dynamically. I then changed my implementation to use a fixed-size buffer, which is in the current patch. Here's an overview of what the no-hang patch does, with some move-file comments as well. When the receiver process is created, it forks off a generator process on the same machine with two pipes between them (both flowing from the receiver to the generator). The first is an error channel (that is also used for verbose output) and the second is a redo channel that sends the numbers of the files that need to be reprocessed. In the generator, the first channel is constantly checked for content, even when we're reading the redo channel or writing out data to the sender (this is necessary to keep the receiver from blocking trying to send the generator data while the generator is trying to do something else). However, the "redo" pipe is not currently kept clear. It is assumed that the number of redo items will fit within a pipe's data buffer. This assumption is usually right, but for really large numbers of files it might fill up and cause rsync to hang. (Also, my move-files patch uses this channel, so it is imperative that the redo channel be kept clear for --move-files to work). My no-hang patch adds an array of flag ints, one for each item in the list of files that are being sent. The read process in io.c is then extended to allow the redo pipe to be monitored, flagging all redo items that show up into the flag array. This keeps the channel clear, and provides a way to regenerate the list of redo items for the generator. (The move-files patch extends this to flag which items are complete and can be deleted.) The only complicating factor is what happens when we actually read the redo channel's fd in a blocking manner at the end of the run (when we're waiting for the -1 EOF flag). While doing this, we need to be reading the error channel and also continuing to flush the write channel to the sender. If we break away from the redo channel work to read or write something else, that read function might actually read data from the redo channel as a side-effect of its primary work (and we can't disable this, since we need to keep the redo channel from blocking while doing other read/write work). My solution makes the read process aware of when it is reading the redo channel, and has it return a -3 when some side-effect work has already put data into the flag array (instead of trying to read even more data that may not be there). Since the function calling the redo-read knows to look in the array, this results in all the data being processed properly and in the correct order (note that the function also keeps track of how many EOF -1 items it has seen, which is vital to it working properly). Once the redo channel has been made non-blocking, it is a very simple matter to add move-files support. The receiver sends the numbers of all the files that have been successfully written over to the generator process, which forwards them back to the sender via the normal (combo) data channel, and the sender reads these safe-to-delete messages and unlinks the corresponding file for each one it gets. ..wayne..
Re: Moving files revisited
On Thu, 31 Jan 2002, Dave Dykstra wrote: > Ouch, is that another byte for every file? Are there no bits free in > the "flags" field already in file_struct? Yes, it is an extra byte per file. An earlier patch of mine did use bits in the existing flag word in the current per-file structure, but since that structure is created before the receiver forks off the generator, I was thinking that any bit-twiddling of the existing flags would cause a lot of shared memory between the two processes to cease being shared (on systems that support copy-on-write forks, such as Linux). Thus, I think it would be more memory intensive to use the existing data structure's flags (but I haven't verified this with actual memory-size testing). One solution to this would be to use actual shared memory for the file structure shared by the receiver & generator. As for the move-files option, I was thinking that I could write a perl script that would parse the output of rsync -v and delete files that were successfully transferred by rsync when they show up in the verbose output. If can make that work, the need for my nohang patch isn't as great, and I could probably come up with a simpler way to keep the redo channel from filling up (perhaps using a buffer in the receiver process or looking into how to do some portable shared memory). Hmm, something to consider. ..wayne..
Re: rsync-2.5.2 possible buglets
On Fri, 1 Feb 2002, Steve G wrote: > I don't know if this amounts to much, but did you intend to use a & > rather than a && at line 739 of flist.c? Fortunately both items in the "&" expression can only have the value of 1 or 0, so the effect is the same as "&&". It looks like a typo to me, though. ..wayne..
Re: Moving files revisited
On Thu, 31 Jan 2002, Wayne Davison wrote: > As for the move-files option, I was thinking that I could write a perl > script that would parse the output of rsync -v and delete files that > were successfully transferred by rsync when they show up in the verbose > output. I've been meaning to comment on this idea I had. This perl script idea only works when running rsync from the sending system, not when pulling files, so I'd still prefer to have a real --move-files option. Anyone have any comments on my --move-files implementation? The current patch sends a message back from the receiver to the sender, letting it know when it is OK to delete a file. An alternate implementation might be to add a delete pass to the sender to delete all the files en-mass after the whole process completes successfully. I personally prefer the more incremental approach (especially for moving larger numbers of files). ..wayne..
Re: configure --with-rsh=CMD and default blocking-IO support
On Wed, 6 Feb 2002, Martin Pool wrote: > OK, I agree --with-rsh should go in, but I think putting magic > characters into it is needlessly confusing. I would feel much better > about a separate configure option to set the default O_NONBLOCK mode. The complicating factor then becomes: how does the RSYNC_RSH environment variable interact with this default O_NONBLOCK mode, and how can the default blocking be changed via the environment? I came up with the magic character idea in order to try to keep things simple (using only one environment variable instead of trying to keep two different ones in sync). I admit that it's quirky, though. So the obvious alternative is something like this: export RSYNC_RSH=ssh export RSYNC_BLOCKING_IO=1 Perhaps a better idiom might be allow RSYNC_RSH to begin with a command-line option? If the string begins with "--blocking-io " we strip it off and twiddle that command-line flag? If we want to make this orthogonal we could also add support for the --non-blocking-io command-line option and allow this string to appear at the start of the RSYNC_RSH value. What do you think of something like this? export RSYNC_RSH='--blocking-io /usr/bin/ssh -l username' export RSYNC_RSH='--non-blocking-io rsh' Or can you think of a better way to go? ..wayne..
Re: configure --with-rsh=CMD and default blocking-IO support
On Wed, 6 Feb 2002, Dave Dykstra wrote: > Of the proposed alternatives, I like this latter the best, changing > --non-blocking-io to --no-blocking-io. Cool. I like that one as well. Here's an implementation. This patch adds the configure option --with(out)-blocking-io and defines a new variable that gets put into config.h: DEFAULT_BLOCKING_IO. The default for configure is just as before: remsh or rsh gets used with blocking IO on by default. If the user specifies --with-rsh=CMD then the default is --without-blocking-io unless the user also specifies the --with-blocking-io configure option. The code in main.c now uses the DEFAULT_BLOCKING_IO value, but only when we use the default RSYNC_RSH (internal) value. If the user specifies an RSYNC_RSH environment variable (or a remote shell via the command-line), the default is to use non-blocking IO. (This is a slight change in behavior if the user had set RSYNC_RSH=rsh in their environment -- is this acceptable?) The code now allows the remote shell value to contain a single prefixed IO-blocking option. If the string starts with "--" and it has a space in it, the string must start with "--blocking-io ", "--no-blocking-io ", or "-- " (the last item allows someone to use a program name that matches one of our options -- just for completeness). I also updated the main man page to mention the new RSYNC_RSH syntax, and also to not talk like rsh is always the default remote shell. In the --blocking-io section, it used to say that ssh prefers blocking IO. I've never used anything but non-blocking IO with ssh, so is this statement backwards? I tweaked the statement to say that only some versions of ssh prefer blocking IO. Don't forget to run autoconf and autoheader after applying this patch. ..wayne.. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- Index: rsync/configure.in --- rsync/configure.in 6 Feb 2002 04:37:09 - 1.131 +++ rsync/configure.in 6 Feb 2002 22:45:04 - @@ -102,6 +102,23 @@ fi AC_DEFINE_UNQUOTED(RSYNC_RSH, "$RSYNC_RSH", [default -e command]) + +AC_ARG_WITH(blocking-io, + AC_HELP_STRING([--with-blocking-io], [set blocking IO for your remote shell])) + +case "$with_blocking_io" in +'') +if test x"$with_rsh" != x; then + IO=0 +else + IO=1 +fi +;; +no) IO=0 ;; +*) IO=1 ;; +esac + +AC_DEFINE_UNQUOTED(DEFAULT_BLOCKING_IO, $IO, [default to blocking IO]) # arrgh. libc in the current debian stable screws up the largefile # stuff, getting byte range locking wrong Index: rsync/main.c --- rsync/main.c5 Feb 2002 23:05:32 - 1.139 +++ rsync/main.c6 Feb 2002 22:45:08 - @@ -178,10 +178,25 @@ extern int read_batch; if (!read_batch && !local_server) { /* dw -- added read_batch */ + int def_io = DEFAULT_BLOCKING_IO; if (!cmd) cmd = getenv(RSYNC_RSH_ENV); if (!cmd) cmd = RSYNC_RSH; + else + def_io = 0; + if (*cmd == '-' && cmd[1] == '-' && (tok = strchr(cmd, ' '))) { + if (strncmp(cmd+2, "blocking-io ", 12) == 0) + def_io = 1; + else if (strncmp(cmd+2, "no-blocking-io ", 15) == 0) + def_io = 0; + else if (cmd[2] != ' ') { + rprintf(FERROR,"Invalid remote-shell-IO option: %s\n", + cmd); + exit_cleanup(RERR_SYNTAX); + } + cmd = tok + 1; + } cmd = strdup(cmd); if (!cmd) goto oom; @@ -207,8 +222,8 @@ args[argc++] = rsync_path; - if ((blocking_io == -1) && (strcmp(cmd, RSYNC_RSH) == 0)) - blocking_io = 1; + if (blocking_io < 0) + blocking_io = def_io; server_options(args,&argc); Index: rsync/options.c --- rsync/options.c 5 Feb 2002 23:05:32 - 1.78 +++ rsync/options.c 6 Feb 2002 22:45:09 - @@ -206,7 +206,7 @@ rprintf(F," --no-whole-file turn off --whole-file\n"); rprintf(F," -x, --one-file-system don't cross filesystem boundaries\n"); rprintf(F," -B, --block-size=SIZE checksum blocking size (default %d)\n",BLOCK_SIZE); - rprintf(F," -e, --rsh=COMMAND specify rsh replacement\n"); + rprintf(F," -e, --rsh=COMMAND specify the remote shell\n"); rprintf(F," --rsync-path=PATH specify path to rsync on the remote machine\n"); rprintf(F," -C, --cvs-exclude auto ignore files in the same way CVS does\n"); rprintf(F," --existing only update files that already exist\n"); Index: rsync/rsync.yo --- rsync/rsync.yo 5 Feb 2002 23:05:33 -
Re: configure --with-rsh=CMD and default blocking-IO support
On Thu, 7 Feb 2002, Martin Pool wrote: > A general-purpose RSYNC_OPTS variable would be more tasteful. I think > popt makes supporting this fairly straightforward. That's a nice idea. One area we'll want to be careful of is how the two options interact. For instance, we want to support old scripts that might set RSYNC_RSH and then run a bunch of rsync commands. It would be nice to make this work without conflicting with a user's also-existing RSYNC_OPTS var. A potential solution is to ignore RSYNC_OPTS if RSYNC_RSH is set (which also serves to wean people away from RSYNC_RSH if they want to be able to set other default options). Another potential problem area is how to override already-set options. If someone wants to put -a into their RSYNC_OPTS variable, how can they then turn it off? I suppose we could just say that the user gets what she deserves in such a case. So, perhaps I'm trying to solve a problem that isn't really all that important. Just having the ability to set the default remote shell and its IO mode might be good enough for most people, and we let the rest use shell scripts or aliases, like you said. I could trim down my last patch to avoid the extra RSYNC_RSH parsing if you'd like to just apply the other part of it. Or, feel free to tweak it yourself -- it should be pretty easy. ..wayne..
Re: Deleting files from source after a successful rsync !
On Thu, 7 Feb 2002, Kapoor, Nishikant X wrote: > I have a few clients who prepare some reports and put it in their > outgoing/ directory for me to pick up every morning. Is there a way to > delete those files from their outgoing/ after I fetch them ? You can use my --move-files patch for this, which also requires my no-hang patch. (You'd have to get them to install this updated rsync on their end as well as on your end.) If that is not possible, you'd have to kludge something together that would keep track of what files got grabbed and run a separate ssh process with a manual rm command. If you're running 2.5.2, you can apply the most recent versions of my patches: http://www.clari.net/~wayne/rsync-nohang.patch http://www.clari.net/~wayne/rsync-move-files.patch (Use "patch -p1 http://www.clari.net/~wayne/rsync-2.5.1-nohang.patch http://www.clari.net/~wayne/rsync-2.5.1-move-files.patch If you're running an even older rsync, you should be able to hand-patch the rejected chunks from the 2.5.1 versions. I use an older rsync with these changes on my production systems (to move ever-arriving information from box to box), and it works great for me. Future versions of rsync will hopefully have some version of the --move-files option included, though we haven't finished the discussion of exactly what we want to do for the official release. ..wayne..
Re: problem getting just a single dir !
On Sun, 10 Feb 2002, Nishikant Kapoor wrote: > I am trying to fetch a single dir using the following command but all I > get is a empty dir: > > rsync -av www.myServer.com::myStuff --include=myDir --exclude=* . Includes are tricky that way -- you told it to just include the directory, but you didn't tell it to include anything within the directory. This is because --exclude=* excludes everything at every level that didn't get explicitly mentioned (I'm assuming you protected the '*' so that it didn't get expanded by the shell). The easiest way to accomplish what you want is to do to just name the directory without using the include/exclude options: rsync -av www.myServer.com::myStuff/myDir . If you want to use include & exclude, you could do this: rsync -av www.myServer.com::myStuff --include=/myDir --exclude=/* . This tells rsync to only exclude the items at the base of the path that are not myDir, not all items at all levels. Alternately, you could do this: rsync -av www.myServer.com::myStuff --include=/myDir** --exclude=* Where you explicitly include everything within myDir (the "**" matches slashes, so it includes subdir content as well, and the initial '/' is required for it to match the whole path). ..wayne..
Re: Exclude directories
On Wed, 13 Feb 2002, Ian Kettleborough wrote: > ie: > /usr/src > or > /usr/src/ One thing that totally tripped me up at first is that you don't include the whole path if you're not starting the transfer from the root of the filesystem. For instance: rsync -av /usr/ foobar:/usr All your excludes would be relative to /usr/, so you'd use /src/ to exclude /usr/src/. If you use verbose mode to see the names that rsync is sending, the names you must put in your include/exclude items need to match those (with an added starting slash to anchor the match). ..wayne..
Re: Debian bug #128632 && fork
On Mon, 18 Feb 2002, Martin Pool wrote: > Why the sleep() call? Also, why close(fd) twice? > > + } else if (pid < 0) { > > + rprintf(FERROR, "could not create child process: %s\n", > > + strerror(errno)); > > + close(fd); > > + sleep(2); > > } > > > > close(fd); ..wayne..
Re: include exclude help please.
Seems to me that the simplest solution is to name the directory explicitly: rsync -a --include "*/" --include "*.tif" --exclude "*" /film/jonah /tmp/film To accomplish the same thing using includes, you could do this: rsync -a --include /jonah --include "/jonah/**/" --include "*.tif" \ --exclude "*" /film/ /tmp/film If you want to exclude any empty directories that either of these commands creates, you'll have to be more specific in the directory path that is allowed to succeed. I.e., if there's a "foo/bar" path inbetween jonah and sourceimages, you'd need to do something like this: rsync -a --include /jonah --include /jonah/foo --include /jonah/foo/bar \ --include /jonah/foo/bar/sourceimages --include "*.tif" \ --exclude "*" /film/ /tmp/film I haven't tested any of these, but they look right to me. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: transferring individual files question, pull vs. push
On Tue, 19 Mar 2002, Jeff Field wrote: > rsync -e ssh source-box.x.com:/var/qmail/control/file1 \ > source-box.x.com:/var/qmail/control/file2 \ > source-box.x.com:/var/qmail/control/file3 \ > source-box.x.com:/var/qmail/control/file4 \ > /var/qmail/control You can't have multiple remote-machine specifications, even if they refer to the same machine. The only thing you can do is to use wildcards that get remote-expanded (by the remote shell) or copy entire directories. For instance: rsync -e ssh source-box.x.com:/var/qmail/control/file\? /var/qmail/control ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync 2.5.5 --delete-after option bug
On Thu, 25 Apr 2002, Dave Dykstra wrote: > I think --delete-after should imply --delete. Would someone like to > work up the simple patch to the code and the man page? Sure. Here's one (note that the OPT_DELETE_AFTER enum was already defined for some reason). ..wayne.. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- Index: options.c --- options.c 16 Apr 2002 01:38:21 - 1.92 +++ options.c 25 Apr 2002 21:57:48 - @@ -306,7 +306,7 @@ {"delete", 0, POPT_ARG_NONE, &delete_mode , 0, 0, 0 }, {"existing", 0, POPT_ARG_NONE, &only_existing , 0, 0, 0 }, {"ignore-existing", 0, POPT_ARG_NONE, &opt_ignore_existing , 0, 0, 0 }, - {"delete-after", 0, POPT_ARG_NONE, &delete_after , 0, 0, 0 }, + {"delete-after", 0, POPT_ARG_NONE, 0, OPT_DELETE_AFTER, 0, 0 }, {"delete-excluded", 0, POPT_ARG_NONE, 0, OPT_DELETE_EXCLUDED, 0, 0 }, {"force",0, POPT_ARG_NONE, &force_delete , 0, 0, 0 }, {"numeric-ids", 0, POPT_ARG_NONE, &numeric_ids , 0, 0, 0 }, @@ -476,7 +479,12 @@ * non-default setting. */ modify_window_set = 1; break; - + + case OPT_DELETE_AFTER: + delete_after = 1; + delete_mode = 1; + break; + case OPT_DELETE_EXCLUDED: delete_excluded = 1; delete_mode = 1; Index: rsync.yo --- rsync.yo8 Apr 2002 05:30:28 - 1.96 +++ rsync.yo25 Apr 2002 22:01:47 - @@ -485,11 +485,12 @@ dit(bf(--delete-excluded)) In addition to deleting the files on the receiving side that are not on the sending side, this tells rsync to also delete any files on the receiving side that are excluded (see --exclude). +Implies --delete. dit(bf(--delete-after)) By default rsync does file deletions before transferring files to try to ensure that there is sufficient space on the receiving filesystem. If you want to delete after transferring -then use the --delete-after switch. +then use the --delete-after switch. Implies --delete. dit(bf(--ignore-errors)) Tells --delete to go ahead and delete files even when there are IO errors. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Updating the docs/help on the default remote shell
Since rsync can now be configured with a different default remote shell than "rsh", I think the docs should be updated a bit. Anyone object to these changes? (Note that I also fixed the misstatement that ssh prefers blocking IO.) ..wayne.. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- Index: options.c --- options.c 2002/05/03 22:59:17 1.93 +++ options.c 2002/05/03 23:28:47 @@ -230,7 +230,7 @@ rprintf(F," --no-whole-file turn off --whole-file\n"); rprintf(F," -x, --one-file-system don't cross filesystem boundaries\n"); rprintf(F," -B, --block-size=SIZE checksum blocking size (default %d)\n",BLOCK_SIZE); - rprintf(F," -e, --rsh=COMMAND specify rsh replacement\n"); + rprintf(F," -e, --rsh=COMMAND specify the remote shell\n"); rprintf(F," --rsync-path=PATH specify path to rsync on the remote machine\n"); rprintf(F," -C, --cvs-exclude auto ignore files in the same way CVS does\n"); rprintf(F," --existing only update files that already exist\n"); Index: rsync.yo --- rsync.yo2002/05/03 22:58:01 1.97 +++ rsync.yo2002/05/03 23:28:48 @@ -77,11 +77,13 @@ See the file README for installation instructions. -Once installed you can use rsync to any machine that you can use rsh -to. rsync uses rsh for its communications, unless both the source and -destination are local. +Once installed, you can use rsync to any machine that you can access via +a remote shell (as well as some that you can access using the rsync +daemon-mode protocol). For remote transfers, rsync typically uses rsh +for its communications, but it may have been configured to use a +different remote shell by default, such as ssh. -You can also specify an alternative to rsh, either by using the -e +You can also specify any remote shell you like, either by using the -e command line option, or by setting the RSYNC_RSH environment variable. One common substitute is to use ssh, which offers a high degree of @@ -135,7 +137,7 @@ manpagesection(CONNECTING TO AN RSYNC SERVER) -It is also possible to use rsync without using rsh or ssh as the +It is also possible to use rsync without a remote shell as the transport. In this case you will connect to a remote rsync server running on TCP port 873. @@ -144,7 +146,7 @@ your web proxy. Note that your web proxy's configuration must allow proxying to port 873. -Using rsync in this way is the same as using it with rsh or ssh except +Using rsync in this way is the same as using it with a remote shell except that: itemize( @@ -242,7 +244,7 @@ --no-whole-file turn off --whole-file -x, --one-file-system don't cross filesystem boundaries -B, --block-size=SIZE checksum blocking size (default 700) - -e, --rsh=COMMAND specify rsh replacement + -e, --rsh=COMMAND specify the remote shell to use --rsync-path=PATH specify path to rsync on the remote machine -C, --cvs-exclude auto ignore files in the same way CVS does --existing only update files that already exist @@ -505,8 +507,8 @@ dit(bf(-e, --rsh=COMMAND)) This option allows you to choose an alternative remote shell program to use for communication between the local and -remote copies of rsync. By default, rsync will use rsh, but you may -like to instead use ssh because of its high security. +remote copies of rsync. By default, rsync is typically configured to use +rsh, but you may like to instead use ssh because of its high security. You can also choose the remote shell program using the RSYNC_RSH environment variable. @@ -661,7 +663,8 @@ a remote shell transport. If -e or --rsh are not specified or are set to the default "rsh", this defaults to blocking IO, otherwise it defaults to non-blocking IO. You may find the --blocking-io option is needed for some -remote shells that can't handle non-blocking IO. Ssh prefers blocking IO. +remote shells that can't handle non-blocking IO. (Note that ssh prefers +non-blocking IO.) dit(bf(--no-blocking-io)) Turn off --blocking-io, for use when it is the default. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
A simpler move-files patch
In an effort to get my long-desired move-files functionality into rsync, I have created a version of my patch that runs as an extra pass at the end of the processing. This results in a simpler set of changes to rsync. I still think it would be nice to have incremental deletions during large transfers (as my first patch provides), but acceptance of this patch would relegate such quibbling to a discussion of future optimizations. One thing that this patch does differently than my last one is this: it removes all synchronized files from the server, even ones that were already up-to-date. (I had been meaning to make my previous patch also include up-to-date files, but hadn't gotten around to it before this.) As before, directories are not affected. This patch is for CVS, but the offsets assume that my last patch to rsync.yo has already been applied. Let me know what you think. ..wayne.. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- Index: options.c --- save/options.c Sat May 4 11:22:22 2002 +++ options.c Sat May 4 11:27:17 2002 @@ -86,6 +86,7 @@ int modify_window=0; #endif int blocking_io=-1; +int move_files=0; /** Network address family. **/ @@ -240,6 +241,7 @@ rprintf(F," --delete-after delete after transferring, not before\n"); rprintf(F," --ignore-errors delete even if there are IO errors\n"); rprintf(F," --max-delete=NUMdon't delete more than NUM files\n"); + rprintf(F," --move-filesremove the synchronized files from the +sending side\n"); rprintf(F," --partial keep partially transferred files\n"); rprintf(F," --force force deletion of directories even if not empty\n"); rprintf(F," --numeric-ids don't map uid/gid values by user/group name\n"); @@ -290,7 +292,7 @@ OPT_LOG_FORMAT, OPT_PASSWORD_FILE, OPT_SIZE_ONLY, OPT_ADDRESS, OPT_DELETE_AFTER, OPT_EXISTING, OPT_MAX_DELETE, OPT_BACKUP_DIR, OPT_IGNORE_ERRORS, OPT_BWLIMIT, OPT_BLOCKING_IO, - OPT_NO_BLOCKING_IO, OPT_WHOLE_FILE, OPT_NO_WHOLE_FILE, + OPT_NO_BLOCKING_IO, OPT_WHOLE_FILE, OPT_NO_WHOLE_FILE, OPT_MOVE_FILES, OPT_MODIFY_WINDOW, OPT_READ_BATCH, OPT_WRITE_BATCH, OPT_IGNORE_EXISTING}; static struct poptOption long_options[] = { @@ -365,6 +367,7 @@ {"hard-links", 'H', POPT_ARG_NONE, &preserve_hard_links , 0, 0, 0 }, {"read-batch", 0, POPT_ARG_STRING, &batch_prefix, OPT_READ_BATCH, 0, 0 }, {"write-batch", 0, POPT_ARG_STRING, &batch_prefix, OPT_WRITE_BATCH, 0, 0 }, + {"move-files", 0, POPT_ARG_NONE, &move_files, 0, 0, 0 }, #ifdef INET6 {0,'4', POPT_ARG_VAL,&default_af_hint, AF_INET , 0, 0 }, {0,'6', POPT_ARG_VAL,&default_af_hint, AF_INET6 , 0, 0 }, @@ -813,6 +816,9 @@ args[ac++] = "--compare-dest"; args[ac++] = compare_dest; } + + if (move_files) + args[ac++] = "--move-files"; *argc = ac; } Index: rsync.h --- rsync.h 2002/04/11 02:18:51 1.131 +++ rsync.h 2002/05/04 19:20:29 @@ -47,6 +47,7 @@ #define SAME_NAME SAME_DIR #define LONG_NAME (1<<6) #define SAME_TIME (1<<7) +#define FLAG_NO_DELETE (1<<8) /* update this if you make incompatible changes */ #define PROTOCOL_VERSION 26 Index: rsync.yo --- save/rsync.yo Fri May 3 16:35:18 2002 +++ rsync.yoSat May 4 11:53:41 2002 @@ -254,6 +254,7 @@ --delete-after delete after transferring, not before --ignore-errors delete even if there are IO errors --max-delete=NUMdon't delete more than NUM files + --move-filesremove the synchronized files from the sending side --partial keep partially transferred files --force force deletion of directories even if not empty --numeric-ids don't map uid/gid values by user/group name @@ -496,6 +497,10 @@ dit(bf(--ignore-errors)) Tells --delete to go ahead and delete files even when there are IO errors. + +dit(bf(--move-files)) This tells rsync to remove the source files on the +sending side that are either successfully transferred to the receiving +side or are already up-to-date (directories are not removed). dit(bf(--force)) This options tells rsync to delete directories even if they are not empty when they are to be replaced by non-directories. This Index: sender.c --- sender.c2002/04/09 06:03:50 1.17 +++ sender.c2002/05/04 19:20:29 @@ -26,6 +26,7 @@ extern int io_error; extern int dry_run; extern int am_server; +extern int move_files; /** @@ -184,6 +185,7 @@ rprintf(FERROR,"send_files failed to open %s: %s\n", fname,strerror(errno)); free_sums(s); + file->flags |= FLAG_NO_DELETE;
Re: Send Password with RSYNC_PASSWORD ore --password-file
On Sat, 4 May 2002, Manfred Gnaedig wrote: > If i use this > rsync -varpog -e ssh --stats /home/www/web6 > 217.172.xxx.xxx:/home/www/web6 --password-file=host1.pwd > the Server is asking me too fore Passwort. Ssh is asking you for the password. However, the --password-file option (as well as the RSYNC_PASSWORD environment variable) only affects transfers to an rsync daemon, which you are not using (the rsync daemon syntax requires 2 colons after the hostname). So, you either need to switch over to using an rsync daemon (and leave the "-e ssh" option off), or you need to setup ssh so that it doesn't prompt you for a password (testing it w/o rsync first is easiest). One way to setup ssh is to enable an RSA authorized key on the server you're connecting to. Look for the discussion of the files identity, identity.pub, and authorized_keys. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Failure to update differing file
On Sat, 4 May 2002, Michael Fischer wrote: > 1. If I touched only the corrupted file, so the file times differed, > then rsync did update the destination file. > > 2. If I used the --checksum flag, then it updated correctly. > > But just a plain rsync failed to notice that the files were different. Then it sounds like rsync was behaving exactly as it should. By default it just compares the file times and size and omits anything that appears to be up-to-date by that standard. The --checksum option tells it to go a step farther and check if the checksums match before deciding if the files are really the same (which is extremely slow and not usually needed, so it's not on by default). ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Send Password with RSYNC_PASSWORD ore --password-file
On Sat, 4 May 2002, Manfred Gnaedig wrote: > mkdir 217.172.xxx.xxx/home/www/web10 : No such file or directory (1) You left out the "::". Also, the syntax for server mode is slightly different -- you need to refer to a module name on the server. So, if you have an rsync daemon configured and running on your 217.127.* host, you could use this to see the module names: rsync 217.172.xxx.xxx:: Check into the rsync.conf man page for how to configure a module, give it a password, etc. If you don't want to run an rsync daemon, you need to work on the angle of getting ssh to let you connect without a password instead. See the ssh-keygen man page for the easy way to go. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Prevent infinite recursion in rwrite()
Here's a resend of an old patch that is intended to avoid an infinite recursion (ending in a stack overflow) of the rwrite() function getting an error that calls rwrite(), ad naseum. I've only seen this happen when one of the sides dies due to a program error -- in that case, the connection is closed, and when we try to send an error to the other side and it generates an error, the error generates an error, etc. My solution is to use a simple static variable as a semaphore. If we get back to rwrite() with a non-zero value, we never again try to send a message over the socket. This results in the error going out to stderr. In the problem case I saw, this resulted in an error message being displayed on my terminal (2 actually) instead of a weird crash. ..wayne.. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- Index: log.c --- log.c 2002/04/08 09:10:50 1.61 +++ log.c 2002/05/07 00:32:30 @@ -215,6 +215,7 @@ void rwrite(enum logcode code, char *buf, int len) { FILE *f=NULL; + static char semaphore = 0; extern int am_daemon; extern int am_server; extern int quiet; @@ -243,8 +244,11 @@ * io_multiplex_write can fail if we do not have a multiplexed * connection at the moment, in which case we fall through and * log locally instead. */ - if (am_server && io_multiplex_write(code, buf, len)) { - return; + if (am_server && (!semaphore++)) { + int ret = io_multiplex_write(code, buf, len); + semaphore--; + if (ret) + return; } if (am_daemon) { ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: A simpler move-files patch
On Thu, 9 May 2002, Dave Dykstra wrote: > Maybe I'm dense, but I don't see how that's any different from turning > on a flag (with the opposite meaning) at the end. The reason this makes a difference is that not all the files get into that code. Any files that are identical just get skipped over on the generator side, so the sender never sees them (in that loop). So, the sender needs to assume that we can delete all the files in the list until we're told what files are not identical. An alternate way to implement this is to modify the generator process to send a special "this file is identical" sequence when we're in file- moving mode. That would allow the sending process to remove identical files immediately on the sending side, and then we could just mark the differing files with a "delete me" flag after we finish sending out all the updates. Another thought just occurred to me on how to implement this without resorting to a post-processing pass. It might be possible to have the receiver send the "delete me" events over the error message pipe (rather than the redo pipe), and since the generator already keeps this pipe unblocked, that would allow the code to work without first fixing the redo pipe's blockability. I can check into this. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: wildcards (was Re: a problem I'm having with rsync-4.5.4)
On Thu, 9 May 2002, Dave Dykstra wrote: > I would say it's definitely too risky for 2.5.6. What would you say to adding a (simple) loop to the fnmatch() code that would cause unanchored things like "foo/*/bar" to not be bound to the start of the filename? This would make it work in an equivalent way to the unanchored non-wildcard strings. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: wildcards (was Re: a problem I'm having with rsync-4.5.4)
On Thu, 9 May 2002, Wayne Davison wrote: > What would you say to adding a (simple) loop to the fnmatch() code that Just to clarify (since the above is poorly worded) -- I meant adding the loop to the rsync code that calls fnmatch(), not trying to modify the fnmatch() code directly. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: wildcards (was Re: a problem I'm having with rsync-4.5.4)
On Thu, 9 May 2002, Dave Dykstra wrote: > How many times would you have to call fnmatch for every file? We'd call fnmatch() an extra time for every slash in the path. However, the performance hit of this new loop on the pattern "foo/*" would be the same as using the two patterns "/**/foo/*" & "/foo/*" (_except_ that the trailing '*' would work right in the first pattern) -- this is because "**" already has to do a recursive match iteration, and that's kind of what our new loop would be doing outside of fnmatch() (we'd actually be doing less recursive calls, since fnmatch() would call itself an extra time for every character in the path, but our loop would only call for every character after a slash). So yes, this is slightly less efficient for unanchored patterns. It would make the code work as advertised, though, and any pattern that was anchored with a leading slash would be entirely unaffected. On the downside, it could cause some people who use unanchored patterns as if they were actually anchored to be surprised by the change in behavior. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: wildcards
On Fri, 10 May 2002, Dave Dykstra wrote: > If you dynamically created a */*/*/foo/* pattern with the number of */ > to match the current path it would only have to call fnmatch once. That's assuming the pattern doesn't contain an interior/trailing "**" (which could only use the try-after-each-slash loop). Also, there's no need to tweak the pattern -- it would be the same amount of work to just figure out where in the filename your prefix of "*/*/*/" represents and match at that position (since we'd have to count slashes anyway). We'd also have to be careful to ensure that there aren't any exceptional patterns that could lead to problematical positioning. A useful question at this point would be: Does the extra complexity make a big enough difference to be worth it? With all of the file I/O going on, I'm wondering if it would even be noticed. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: bug report
On Fri, 10 May 2002, terrell Larson wrote: > If rsync is directed to copy a directory tree into another machine and > the target directory does not exist then rsync will not create the > required path Dave Dykstra just recently responded to another user that this is the intended behavior of rsync. It will create one level of new directory in the destination, but no more. You could make the command you cited above work by specifying "etc" rather than "etc/": rsync -av --progress -e "ssh -1" /etc $1:/altsync/$HOSTNAME This will create the $HOSTNAME dir, if needed, but you can't use anything deeper than one directory in the source path. The other way to go is to use the --relative option: rsync -avP --relative -e "ssh -1" /any/path/at/all $1:/altsync/$HOSTNAME This will create the $HOSTNAME dir and all the /any/path/at/all dirs, as needed. > The [option-specifying form] of the -e option is not documented. > IMHO it should be. I agree. I've whipped up the following patch for rsync.yo, which I will commit to CVS in a moment: Index: rsync.yo --- rsync.yo2002/05/09 21:44:46 1.99 +++ rsync.yo2002/05/10 19:47:05 @@ -515,6 +515,13 @@ remote copies of rsync. Typically, rsync is configured to use rsh by default, but you may prefer to use ssh because of its high security. +Feel free to include options in the COMMAND. For instance: + +quote(-e "ssh -1 -l joe") + +(Note that ssh users can alternately store off site-specific connect +options in their .ssh/config file.) + You can also choose the remote shell program using the RSYNC_RSH environment variable. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: bug report
On Fri, 10 May 2002, jw schultz wrote: > Also the example is an odd one. It doesn't seem odd to me since the -l option is the one that I've used most in ssh (when I don't use the config file to avoid all options). The important part of the example is showing how it's quoted, so what's in it could certainly be tweaked. I like the addition of your "presented as a single argument" caveat to the text. I had added the extra "chattiness" because, even though it is possible to override ssh via -e, doing this is really a less desirable solution than using the .ssh/config file. I thought it might be helpful to point people at the better solution so they can avoid having to use the -e option at all. If others don't like this text, it could be removed. As for the -1 option, it just forces the ssh1 protocol. I left it there since it was the option that started the discussion. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: bug report
OK, I just checked in a change that uses some of your suggested text to remove a bit of the chattiness. I also improved the RSYNC_RSH section to mention the legality of command-line options. See if you like it better. --- rsync.yo2002/05/09 21:44:46 1.99 +++ rsync.yo2002/05/11 08:31:55 1.101 @@ -515,8 +515,16 @@ remote copies of rsync. Typically, rsync is configured to use rsh by default, but you may prefer to use ssh because of its high security. +Command-line arguments are permitted in COMMAND provided that COMMAND is +presented to rsync as a single argument. For example: + +quote(-e "ssh -p 2234") + +(Note that ssh users can alternately customize site-specific connect +options in their .ssh/config file.) + You can also choose the remote shell program using the RSYNC_RSH -environment variable. +environment variable, which accepts the same range of values as -e. See also the --blocking-io option which is affected by this option. @@ -982,8 +990,8 @@ more details. dit(bf(RSYNC_RSH)) The RSYNC_RSH environment variable allows you to -override the default shell used as the transport for rsync. This can -be used instead of the -e option. +override the default shell used as the transport for rsync. Command line +options are permitted after the command name, just as in the -e option. dit(bf(RSYNC_PROXY)) The RSYNC_PROXY environment variable allows you to redirect your rsync client to use a web proxy when connecting to a ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Problems with the rsync command line syntax for multiple files
On Mon, 13 May 2002, Peter Møller Neergaard wrote: > types:/<3>tmp/wwwreports-dont-edit > echo *.html > [...lots of files with colons in them...] Rsync treats a colon on the commandline as a separator between a machine name and the filename, so you can't use *.html if it expands to one or more names that includes a colon UNLESS the name follows something like a slash, that is illegal as a hostname. So, try using "./*.html" instead. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: wildcards
On Mon, 13 May 2002, Dave Dykstra wrote: > I suggest you go ahead and code it in the way you > think would be simplest and then we can evaluate it more concretely. OK. Here's the simple patch. It optimizes the loop away if the pattern starts with "**" (since the loop would be superfluous), but otherwise it just loops over all the slashes in the name when the pattern is an unanchored path (i.e. contains at least one interior slash). I'll post another version where I implemented your suggested optimization in a moment. ..wayne.. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- Index: exclude.c --- exclude.c 2002/04/11 02:25:53 1.44 +++ exclude.c 2002/05/13 19:43:43 @@ -66,6 +66,8 @@ } } ret->fnmatch_flags = 0; + if (strncmp(pattern, "**", 2) == 0) + ret->regular_exp = -1; } } @@ -110,6 +112,13 @@ if (ex->regular_exp) { if (fnmatch(pattern, name, ex->fnmatch_flags) == 0) { return 1; + } + if (!match_start && !ex->local && ex->regular_exp > 0) { + while ((name = strchr(name, '/')) != NULL) { + name++; + if (fnmatch(pattern, name, ex->fnmatch_flags) == 0) + return 1; + } } } else { int l1 = strlen(name); ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: wildcards
Here's a more complex version of the wildcard change that attempts to count slashes in the pattern (if it does not contain "**" anywhere) and to match at the appropriate level. In trying to think up patterns where this might mess up, the only thing I thought of was something like this: foo/b[^/]r/baz My code would mess this up by counting 3 slashes. This patch is not based on the previous one, but on CVS. Note that neither this patch nor my previous one makes "**/foo" match the file matched by "/foo" (which one might expect it to do). We could add some extra code to make this happen, if desired. Optimization note: I noticed that both this patch and my previous one were only checking for "**" at the start of the pattern to trigger the loop-skipping optimization. I should really change that to check for any leading "*" because of the code's limitation of treating "*" like "**" when "**" is on the line somewhere. ..wayne.. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- Index: exclude.c --- exclude.c 2002/04/11 02:25:53 1.44 +++ exclude.c 2002/05/13 20:30:33 @@ -35,6 +35,7 @@ static struct exclude_struct *make_exclude(const char *pattern, int include) { struct exclude_struct *ret; + char *cp; ret = (struct exclude_struct *)malloc(sizeof(*ret)); if (!ret) out_of_memory("make_exclude"); @@ -55,7 +56,7 @@ if (!ret->pattern) out_of_memory("make_exclude"); if (strpbrk(pattern, "*[?")) { - ret->regular_exp = 1; + ret->wild_exp = 1; ret->fnmatch_flags = FNM_PATHNAME; if (strstr(pattern, "**")) { static int tested; @@ -66,6 +67,8 @@ } } ret->fnmatch_flags = 0; + if (strncmp(pattern, "**", 2) == 0) + ret->wild_exp = -1; } } @@ -74,9 +77,8 @@ ret->directory = 1; } - if (!strchr(ret->pattern,'/')) { - ret->local = 1; - } + for (cp = ret->pattern; (cp = strchr(cp, '/')) != NULL; cp++) + ret->slash_cnt++; return ret; } @@ -95,7 +97,7 @@ int match_start=0; char *pattern = ex->pattern; - if (ex->local && (p=strrchr(name,'/'))) + if (!ex->slash_cnt && (p=strrchr(name,'/'))) name = p+1; if (!name[0]) return 0; @@ -107,9 +109,24 @@ pattern++; } - if (ex->regular_exp) { + if (ex->wild_exp) { + if (!match_start && ex->slash_cnt && ex->fnmatch_flags != 0) { + int cnt = ex->slash_cnt + 1; + for (p = name + strlen(name) - 1; p >= name; p--) { + if (*p == '/' && !--cnt) + break; + } + name = p+1; + } if (fnmatch(pattern, name, ex->fnmatch_flags) == 0) { return 1; + } + if (!ex->fnmatch_flags && !match_start && ex->wild_exp > 0) { + while ((name = strchr(name, '/')) != NULL) { + name++; + if (fnmatch(pattern, name, ex->fnmatch_flags) == 0) + return 1; + } } } else { int l1 = strlen(name); Index: rsync.h --- rsync.h 2002/04/11 02:18:51 1.131 +++ rsync.h 2002/05/13 20:30:34 @@ -392,11 +392,11 @@ struct exclude_struct { char *pattern; - int regular_exp; + int wild_exp; int fnmatch_flags; int include; int directory; - int local; + int slash_cnt; }; struct stats { ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Status Query - Please respond - Re: Patch to avoid 'Connectionreset by peer' error for rsync on cygwin
Here's an idea which I haven't had a chance to investigate: Would it be possible to use atexit() to register a call to shutdown() for cygwin (or a call to a custom function that would call shutdown() for the appropriate socket fds)? This should allow cgywin's broken socket code to get properly cleaned up without having to sprinkle a bunch of cygwin-specific code all over the source (as long as the socket fds don't get closed before we start the exit handling). ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Problems getting rsync working...
On Thu, 16 May 2002, Brad wrote: > The command which is run on the client: > rsync -avt /var/spool/mail StorageServer::email Did you either startup an "rsync --daemon" manually on the server or setup [x]inetd to spawn "rsync --daemon" when someone connects to the rsync port? When you use the "::" syntax, there needs to be an rsync daemon to handle the connection. The alternative is to use just one ":" instead of "::" and let ssh handle the connection. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Status Query - Please respond - Re: Patch to avoid 'Connectionreset by peer' error for rsync on cygwin
On Thu, 16 May 2002, Max Bowsher wrote: > That just moves the shutdown call from where you finish with the fd to > where you start using the fd - that's got to be less intuitive. Being more or less intuitive is not the point. The idea was to have as little cygwin kludge code as possible. Thus, we'd just have one call to atexit() during startup, with the single cleanup function being able to handle any and all opened sockets, and we're done (if this is even feasible -- I haven't looked into it). This was prompted by Martin's statement that he considers this a cygwin bug -- I was assuming that he didn't want to make sweeping changes to all the cleanup code in rsync. Whether he wants to handle this in a more invasive manner is up to him. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Improving the rsync protocol (RE: Rsync dies)
On Fri, 17 May 2002, Allen, John L. wrote: > In my humble opinion, this problem with rsync growing a huge memory > footprint when large numbers of files are involved should be #1 on > the list of things to fix. I have certainly been interested in working on this issue. I think it might be time to implement a new algorithm, one that would let us correct a number of flaws that have shown up in the current approach. Toward this end, I've been thinking about adding a 2nd process on the sending side and hooking things up in a different manner: The current protocol has one sender process on the sending side, while the receiving side has both a generator process and a receiver process. There is only one bi-directional pipe/socket that lets data flow from the generator to the sender in one direction, and from the sender to the receiver in the other direction. The receiver also has a couple pipes connecting itself to the generator in order to get data to the sender. I'd suggest changing things so that a (new) scanning process on the sending side would have a bi-directional link with the generator process on the receiving side. This would let both processes descend through the tree incrementally and simultaneously (working on a single directory at a time) and figure out what files were different. The list of files that needed to be transferred PLUS a list of what files need to be deleted (if any) would be piped from the scanner process to the sender process, who would have a bi-directional link to the receiver process (perhaps using ssh's multi-channel support?). There would be no link between the receiver and the generator. The advantage of this is that the sender and the receiver are really very simple. There is a list of file actions that is being received on stdin by the sending process, and this indicates what files to update and which files to delete. (It might even be possible to make sender be controlled by other programs.) These programs would not need to know about exclusion lists, delete options, or any of the more esoteric options, but would get told things like the timeout settings via the stdin pipe. In this scenario, all error messages would get sent to the sender process, who would output them on stdout (flushed). The scanner/generator process would be the thing that parses the commandline, communicates the exclude list to its opposite process, and figures out exactly what to do. The scanner would spawn the sender, and field all the error messages that it generates. It would then either output the errors locally or send them over to the generator for output (depending on whether we're pushing or pulling files). As for who spawns the receiver, it would be nice if this was done by the sender (so they could work alone), but an alternative would be to have the generator spawn the receiver and then then let the receiver hook up with the sender via the existing ssh connection. This idea is still in its early stages, so feel free to tell me exactly where I've missed the boat. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Improving the rsync protocol (RE: Rsync dies)
On Fri, 17 May 2002, Wayne Davison wrote: > so feel free to tell me exactly where I've missed the boat. [Replying to myself... hmmm...] In my description of the _new_ protocol, my references to a generator process are not really accurate. The current generator process is forked off after the initial file-list session figures out what files need to be checked for differences, and it then churns out rolling checksums for the sender process. The "generator" in my previous description is really just a receiver-side scanner process (that looks for files that need to be check-summed). So, either the new receiver process would handle the checksum generation itself, or we'd need a 3rd process on the receiver side to generate the checksum data (and it would need a pipeline into the sender). As a first step in investigating this further, I'm looking into librsync to see if it might be easy to create a simple sender/receiver duo using this library. If anyone knows where some decent documentation on librsync is, please let me know (I'm looking for it now, but the tar doesn't appear to come with any decent docs). I was wondering if librsync manages to implement the protocol without forking off a separate generator process... ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Rsync hanging
On Fri, 24 May 2002, Mike Rogers wrote: > RSYNC maybe once a day or so will just hang and sit there... You don't mention what version of rsync you're using. Version 2.4.6 would often hang when the -v option was used, so if you're using that, you'd do well to upgrade. > strace attached to the process produces the following... The system calls you cite are normal for that process. It's the loop in the wait_process() function, which is a normal end-of-run occurrance. One of the other 2 processes is probably hung up on a read or write, and you'd have to look at those processes with strace to see what is going wrong. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Testing a transfer-only rsync tool
I found some time in the past week to work on a simple test app that would hopefully help to answer a few questions that came up recently: 1. Can a single-process generator+receiver work well? (Looks good so far, but I haven't run any multi-processor timing tests yet.) 2. How easy is it to use librsync? (Pretty easy.) 3. How small would a transfer-only tool be? (It's currently around 1400 lines of C code, not counting the librsync code. It was around 900 lines when I first considered releasing a simple working version, but it keeps growing as I flesh out the more advanced features.) 4. Should rsync be separated into a scanning tool and a transfer tool? Or should it contain both bits but also allow the user to override the scanner to fully control what gets transferred? Or should we just try to optimize the current protocol? (No answers yet, but I'm leaving toward the 2nd option above.) My test tool takes in commands on stdin and outputs messages on stdout. It forks a second process as specified via the commandline (which can be any command that runs another rsync_xfer, either locally or remotely). It then allows you to send AND/OR receive any files you specify (as well as delete files, mkdir directories, etc.). Keep in mind that this tool does not attempt to do any of the "scan both systems, looking for files that differ" task. The code is still fairly young and while some of it is pretty good, other bits show signs of being written in haste. I've tested it on a small number of scenarios so far, but nothing exhaustive. Commands accepted by the tool on stdin (* means not yet tested): cd REMOTE_DIR [LOCAL_DIR]chdir both sides at once tmpdir REMOTE_PATH [LOCAL_PATH] where temp-files go get REMOTE_FILE [LOCAL_FILE [BASIS_FILE]]rsync to the local system put LOCAL_FILE [REMOTE_FILE [BASIS_FILE]]rsync to the remote system mvget REMOTE_FILE [LOCAL_FILE [BASIS_FILE]] get, then delete REMOTE_FILE mvput LOCAL_FILE [REMOTE_FILE [BASIS_FILE]] put, then delete LOCAL_FILE del FILE delete a remote file ldel FILEdelete a local file md DIR create a remote directory lmd DIR create a local directory ln OLDNAME NEWNAME create a remote hard link lln OLDNAME NEWNAME create a local hard link sln OLDNAME NEWNAME create a remote symlink lsln OLDNAME NEWNAME create a local symlink mkdev NAME NUMBERcreate a remote device* lmkdev NAME NUMBER create a local device* quit quit Spaces in filenames need to be backslash-quoted, as do backslash characters. E.g. get This\ File.txt That\ File.txt You run the program like this: rsync_xfer -vv ssh remote.com rsync_xfer -s This starts up a local rsync_xfer process in double-verbose mode, and tells it to run the "ssh remote.com rsync_xfer -s" command. You can make this latter command anything you like, as long as it starts up an rsync_xfer with the slave (-s) option. If you're feeling brave and you'd like to try it out, feel free, but treat it like the pre-alpha code that it is. Also keep in mind that every time you tell the program to switch from get to put, all the current outstanding get/put jobs must run to completion before any new jobs start (which can slow down the transfer by reducing the pipe-lining of data). Some of the things yet to do: - Need a way to override the per-file attributes on files we send (it currently preserves the attributes on each source file). - Need a way to specify/set attributes for non-transferred files, new directories, and devices. - It needs a way to output transfer statistics. - There needs to be a timeout option. - It needs some of the error-checking to be polished up. - Some fatal errors might be better as warnings (if done right). - There's no retry check if the file changes during the send. - We need to catch SIGPIPE. - There needs to be better handling of partially-transferred files. - The code needs to be broken up into multiple files. - There's no configuration support (it currently compiles on a modern Linux system). - We might want an option that tells us to connect via socket to a particular hostname (instead of running a command). - The code could use some more verbose-output messages. - It needs more comments. The code is here: http://www.clari.net/~wayne/rsync_xfer.c You need to have librsync installed or available. You compile the code as you would expect: gcc -g -Wall -o rsync_xfer rsync_xfer.c -lrsync ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Rsync'ing lists of files
On Fri, 7 Jun 2002, Stephane Paltani wrote: > I have 5 million files on one side of the ocean, 10 of which must > be copied to the other side. This is the sort of problem that would benefit from the rsync_xfer.c program I'm working on (I mentioned an early version on the list a week or so ago). It allows total control of what gets sent by an external program, so there's no directory scan and no include/exclude processing. I could imagine writing a simple perl script that would take a list of files and turn it into a series of "cput" commands followed by any needed "del" commands to remove the names that vanished from the list after the last run. Unfortunately, the code is still at a very early stage, so it's not yet ready for use in a production environment. I've been working on a new version of the program that is able to transfer trees of files and will also have an improved socket protocol. It works through the tree incrementally, and thus it shouldn't use as much memory as the current rsync implementation. After I get the code in a little better shape, I'm planning to compare its performance with the current implementation and try to figure out if rsync might best benefit from adding support for a new (internal) protocol, or if it just needs some tweaks to the current one. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: problem and a question
On Tue, 11 Jun 2002, Simison, Matthew wrote: > > c:Connection refused > rsh: can't establish connection Did you used to have an RSYNC_RSH variable in your environment? Perhaps one that was set to use ssh? You could run "echo $RSYNC_RSH" on one of your Unix boxes to see what they're set to use. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
2nd release of my new-protocol testing app
I've been having a lot of fun improving my new-protocol testing app. It's seems to be in pretty good shape (for test code), so I figured I'd announce another release for those brave souls that may want to help me in my thinking about a (potential) new rsync protocol. It's a tar.gz file this time because I broke up the code into multiple files. I named it "rzync" just for fun (a very confusing name, no?): http://www.clari.net/~wayne/rzync.tar.gz The new stuff in this release is that it can get/put an entire directory tree of files via getd/putd, and it has conditional get/put commands that handle both files and directories (cget/cput). (For those that missed the first announcement, the program can be totally controlled by an external application via a simple set of commands on stdin.) I've included a perl script named "rs" that will take an rsync-like command line (as long as the destination is a directory and not a file) and drive rzync with it. Keep in mind that rzync still has the -a option hard-wired to on, so "rs -v /path/foo remote:/path" works like "rsync -av /path/foo remote:/path". Things I've noticed so far: - My single-proc generator/receiver seems to perform well when I send data over my DSL connection, but it goes much slower than rsync when sending data over a local pipe. I'm guessing that this is because a multi-process setup can keep the generator pipeline filled to a greater degree. If this is true, one solution would be to add a thread that would be responsible for handling all the generator tasks (and perhaps using the GNU portable thread library if we want to be compatible with systems that don't support process threads). - The deltas produced by librsync are sometimes considerably larger than those produced by rsync, so the speedup of rzync sometimes suffers compared to rsync. I believe that this is because (even without -z) rsync does some compression of the delta data that librsync does not do. - The incremental directory scanning seems to work quite well. I have not fleshed out all the areas that would need to grow dynamically for _really_ large jobs, so if someone wants to try to send some huge directory trees, we'll have to flesh out some more of the code first. - My directory-scanning code does not attempt to handle symlinks, devices, or named sockets yet (it just skips them). - Since the directory-scan data is shared between the two sides using the rsync algorithm, it has the potential to save a lot of transfer bytes when the directories on each side are similar. Feel free to let me know what you think. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync: error writing NNNN unbuffered bytes - exiting:Connectionreset by peer
On 13 Jun 2002, Bill Geddes wrote: > Suggestions on how to proceed would be greatly appreciated. It is possible that one side of the connection is seg-faulting and dying. If you ensure that core files are not disabled (check your ulimit setting), you may find that there is a core file that you could use to figure out where the program is dying. Alternately, you could attach to one or both processes with a debugger after it starts running (e.g. "gdb /usr/bin/rsync 12345" where "12345" is the already-running process ID), tell it to 'c'ontinue to run, and you'll see any abnormal signals that may pop up. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: 2nd release of my new-protocol testing app
On Thu, 13 Jun 2002, Wayne Davison wrote: > http://www.clari.net/~wayne/rzync.tar.gz I forgot to mention that I changed the order of the local/remote args to the 2-arg version of the "cd" command to be "cd LOCAL REMOTE" (the command "cd DIR" still changes both the local and remote sides). This only affects someone who had written a script or an input file to drive my earlier rsynx_xfer release. I hope that didn't trip someone up. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Rsync 2.4.6 and Hammerd CPU's
On Mon, 17 Jun 2002, Sandy Ganz wrote: > Any ideas on how to keep rsync from using all the cpu on the webservers? I Have you tried running rsync under "nice"? Start it up on the webserver side with the same command as before, just put "nice " at the start and see if that relieves the pressure on your CPU. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Release 3 of "rzync" new-protocol test
For anyone who'd like to check out the latest release of my "rzync" [sic] test release, I've just released a new version. For those that might not have time to look at the code but could provide some feedback based on a rough description, I've created the following simple web page: http://www.clari.net/~wayne/new-protocol.html Here's the tar file of the new release: http://www.clari.net/~wayne/rzync-0.03.tar.gz Changes in this version: I've optimized the protocol to make the transferred-byte overhead smaller; I've used an rsync-like file-list compression to make the directory data smaller; I've gotten rid of some previous limitations (such as the 4-byte file-size limit and the lack of reallocating various buffers for really large file-count transfers); I've re-enabled the "move" versions of the various get/put commands (which were disabled in the last release); and I've fixed several bugs. The resulting program seems to be working quite well in my limited testing. The count of transferred bytes in the latest protocol is now below what rsync sends for many commands -- both a start-from-scratch update or a fully-up-to-date update are usually smaller, for instance. This is mainly because my file-list data is smaller, but it's also because I reduced the protocol overhead quite a bit. Transferred bytes for partially-changed files are still bigger than rsync because librsync creates unusually large delta sizes (though there's a patch that makes it work much better, it's still not as good as rsync). In my speed testing, one test was sending around 8.5 meg of data on a local system, and while rsync took only .5 seconds, my rzync app took around 2 seconds. A quick gprof run reveals that 98% of the runtime is being spent in 2 librsync routines, so it looks like librsync needs to be optimized a bit. One potential next steps might include optimizing rsync to make the transferred file-list size a little smaller (e.g. making the transfer of the "size" attribute only as long as needed to store the number would save ~4-5 bytes per file entry on typical files). It looks like work needs to be done on making librsync more efficient. Until I can get some better speed tests, I'm unsure if I should attempt to make rsync talk my new protocol. Opinions welcomed. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
rZync 0.04 -- a faster next-generation protocol test app
FYI, I decided to release a new version of my next-generation protocol test app because I created an optimized transfer mode when files are being sent whole (it bypasses all calls to librsync). This makes my "rZync" test app faster than rsync for sending whole files (rather than 4x slower, like it was). This is significant because it helps to assure me that my single-process generator/receiver will be able to keep up with rsync's dual process implementation. A full-file transfer appears to be faster than rsync, even on a dual processor system. For instance, this test was 775 files in 126 directories: -- rsync -- wrote 32920749 bytes read 12420 bytes 9409476.86 bytes/sec total size is 32869747 speedup is 1.00 rsync -av foo /tmp 2.23s user 1.54s system 162% cpu 2.314 total wrote 32920749 bytes read 12420 bytes 7318482.00 bytes/sec total size is 32869747 speedup is 1.00 rsync -av foo /tmp 2.23s user 1.55s system 105% cpu 3.588 total -- rZync -- wrote 32900189 bytes (16813) read 5534 bytes (5534) 13162289.20 bytes/sec total size is 32869700 speedup is 1.00 rs -av foo /tmp 0.34s user 0.56s system 39% cpu 2.274 total wrote 32900064 bytes (16688) read 5534 bytes (5534) 13162239.20 bytes/sec total size is 32869700 speedup is 1.00 rs -av foo /tmp 0.42s user 0.69s system 58% cpu 1.910 total --- I've also updated my new-protocol web page to explain what I'm trying to accomplish (which some folks probably missed the first-time around): http://www.clari.net/~wayne/new-protocol.html Here's the tar file of the new release: http://www.clari.net/~wayne/rzync-0.04.tar.gz For that that want to try this out, use the "rs" perl script to control rZync in an rsync-like manner (a temporary, test-mode situation), or control it yourself by sending it commands on stdin. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: RegExpr in ---exclude
On Mon, 24 Jun 2002, J.Strohschnitter wrote: > is it possible to use regular expressions in the exclude-paramter of rsync ? > For example: > > rsync --exclude "/path/to/*/[Ff][Oo][Ll][Dd][Ee][Rr]" That's still a valid match pattern (and a poor regular expression -- "/*" would match zero or more slashes as a regex, so that would have to turn into "/.*"). What you're trying to specify is probably failing due to one or more of these problem areas: - You have to use "**" to match any depth of subdirs in between the path and the name parts. I.e. "/path/to/**/[Ff][Oo][Ll][Dd][Ee][Rr]". - Excludes anchor starting at the root of the transfer, not the root of the file system. In other words, if you're sending "/path/*", you'd have to leave off the "/path" in the exclusion. - You might want to leave off the path altogether. Using just the name "[Ff][Oo][Ll][Dd][Ee][Rr]" would exclude that name at any point in the tree. This is like specifying "**/[Ff][Oo][Ll][Dd][Ee][Rr]", but it also matches in the root dir of the transfer, and is more efficient. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
RE: RegExpr in ---exclude
On Mon, 24 Jun 2002, Bernard A Badger wrote: > Just a comment on shell glob usage [...] > Shell globbing is done before the program is invoked, so > the shell globs on "--exclude=/path/to/*/[Ff][Oo][Ll][Dd][Ee][Rr]", but > unless you have a directory "--exclude=", it won't find anything. Quite so. Plus, what happens next is shell (and shell-option) dependent. Some shells always expand their args, so expanding a non-matching arg causes the entire string to vanish (a very useful thing in a script's "for" loop, but not on a command-line). Other shells complain about there being "no match" and refuse to run the command (I have my interactive shell set to do that because it helps guard against mistyped args). Just FYI. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Latest rZync release: 0.06
For the small number of people who are checking this out, I released version 0.05 a couple days ago (and only mentioned it on my new-protocol web page) followed today by 0.06. Some highlights of the two releases: - We handle symlinks now in our recursive synchronization mode. - Directory scanning is no longer limited to one active directory at a time (which was sorely needed when all the directories were up-to-date). - Improved the "rs" control script, including the addition of the ability to specify a different destination name (previously only existing destination directories could be specified). - Added a README with the latest command syntax for controlling rzync. - Some much-needed cleanup of internal structures. - Fixed several bugs. Web resources: http://www.clari.net/~wayne/rZync-0.06.tar.gz http://www.clari.net/~wayne/new-protocol.html There are still unsquashed bugs lurking, so be careful. For instance, I tried to copy my .mozilla dir, and the huge Cache hierarchy is currently giving it grief. I'll debug this problem next. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Latest rZync release: 0.06
On Wed, 26 Jun 2002, Wayne Davison wrote: > There are still unsquashed bugs lurking, so be careful. For instance, I > tried to copy my .mozilla dir, and the huge Cache hierarchy is currently > giving it grief. I'll debug this problem next. Turned out to be a silly oversight on a realloc of some directory data. Applying the following patch fixes things right up. ..wayne.. ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- Index: flist.c --- flist.c 26 Jun 2002 08:45:21 - 1.15 +++ flist.c 26 Jun 2002 17:41:45 - @@ -52,10 +52,14 @@ } len = strlen(fn); if (bp-flist_data + len + 1 > flist_data_size) { - int blen = bp - flist_data; + uchar *old_data = flist_data; flist_data_size *= 2; flist_data = do_realloc(flist_data, flist_data_size); - bp = flist_data + blen; + if (flist_data != old_data) { + for (j = 0; j < cnt; j++) + flist_ptrs[j] += flist_data - old_data; + bp += flist_data - old_data; + } } memcpy(bp, fn, len + 1); flist_ptrs[cnt++] = bp; @@ -95,8 +99,10 @@ continue; // XXX ignore devices for now! } if (bp - compressed_data + PATH_MAX*2 > compressed_data_size) { + int blen = bp - compressed_data; compressed_data_size += 4*1024; compressed_data = do_realloc(compressed_data, compressed_data_size); + bp = compressed_data + blen; } len = strlen(fn); populate_attrs(&attrs, &sb); ---8<--8<--8<--8<---cut here--->8-->8-->8-->8--- -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: strip setuid/setgid bits on backup (was Re: small security-relatedrsync extension)
On Mon, 8 Jul 2002, Eric Horst wrote: > Not to mention, is it a real long-term goal is to redesign rsync to deal > with large numbers of files by not building the entire file list up front? That is something that I'm working on with my rZync application. It implements a new protocol that can begin transferring files as soon as the first directory has been transferred and compared. The program is not yet ready for someone with millions of files to test, though -- I need to change the implementation of the name-cache to handle really large numbers of files. I have a new design that I'll be coding up in the next few days. Once that's done, I hope to get more people to try the code out and let me know how it performs. > If rsync is ever rewritten work directory by directory (or whatever) > building small file lists instead of building the mega filelist then when > do you run the post-process script? After each small batch of files? Or > store up the disposition list till the end effectively building a huge > filelist again? My initial reaction is that it would be best to start a pipe to the application at the start of the transfer and incrementally put data into it as you go along. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Patch to update the included popt to 1.6.4
I'm wondering if we shouldn't just remove popt from the rsync source and just rely on the user to install the popt package on their system prior to compiling rsync. Configure already uses the installed popt in preference to the included popt, so it wouldn't be hard to change this to not have a popt fallback. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
rZync 0.08 released
I've released the next version of my rZync test app. You can find a link to it here: http://www.clari.net/~wayne/new-protocol.html You should also snag the referenced librsync source, as some important bugfixes in librsync are needed to compile rzync. For those that don't know, rZync is my new-protocol test app that I'm using to try out some ideas on how to improve the rsync protocol. It transfers directory information incrementally, so it should have a much lower memory overhead than rsync. The most important change in this release is that I've replaced the name-cache code with something that will be more robust and should work great with really large file transfers. I also changed the command- line syntax and have it now parse several new options, such as -r, -p, -t and such (i.e. the previous behavior of -a being hard-wired to "on" is no longer present) and a few other things. Another important bug fix closes a neglected file handle so we don't overflow the open file limit. ** Be sure to use the new "rs" controlling script and not the old one. ** I've tried the code out on a fairly large data set (~4000 files in ~500 directories), but nothing close to some of you million-file folks. I would not yet recommend trying rZync in a production environment, but if you can run some large file-count tests, please let me know how things go. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: superlifter design notes and rZync feedback
Martin Pool <[EMAIL PROTECTED]> wrote: > I've put a cleaned-up version of my design notes up here > http://samba.org/~mbp/superlifter/design-notes.html I'll start with some feedback on your rzync comments: Re: rzync's name: I currently consider the rZync to be a test app to allow me (and anyone else who wants to fiddle with it) to try out some ideas in protocol design. Integrating the ideas from this back into rsync or into superlifter would be ideal. If I ever decide to release my own file transfer utility, I'll name it something useful at that time (definitely NOT rzync). Re: rzync's variable-length fields: Note that my code allows more variation than just 2 or 4 bytes -- e.g., I size the 8-byte file-size value to only as many bytes as needed to actually store the length. I agree that we should question whether this complexity is needed, but I don't agree that it is wrong on principal. There are two areas where field-sizing is used: in the directory-info compression (which is very similar to what rsync does, but with some extra field-sizing thrown in for good measure), and in the transmission protocol itself: I still have questions about how best to handle the transfer of directory info. I'm thinking that it might be better to remove the rsync-like downsizing of the data and to use a library like zlib to remove the huge redundancies in the dir data during its transmission. In the protocol itself, there are only two variable-size elements that goes into each message header. While this increases complexity quite a bit over a fixed-length message header, it shouldn't be too hard to automate a test that ensures that the various header combinations (particularly boundary conditions) encode and decode properly. I don't know if this level of message header complexity is actually needed (this is one of the things that we can use the test app to check out), but if we decide we want it, I believe we can adequately test it to ensure that it will not be a sinkhole of latent bugs. Re: rzync's name cache. I've revamped it to be a very dependable design that no longer depends on lock-step synchronization in the expiration of old items (just in the creation of new items, which is easy to achieve). Some comments on your registers: You mention having something like 16 registers to hold names. I think you'll find this to be inadequate, but it does depend on exactly how much you plan to cache names outside of the registers, how much retransmission of names you consider to be acceptable, and whether you plan to have a "move mode" where the source file is deleted. My first test app had no name-cache whatsoever. It relied on external commands to drive it, and it sent the source/destination/basis trio of names from side to side before every step of the file's progress. While this was simple, the increased bandwidth necessary to retransmit the names was not acceptable to me. If we just register the active items that are currently being sent over the wire, the name will need to live through the entire sig, delta, patch, and (optionally) source-side-delete steps. When the files are nearly up-to-date, having only 16 of them will, I believe, be overly restrictive. Part of the problem is that the buffered data on the sig-generating side delays the source-side-delete messages quite a bit. If we had a high-priority delete channel, that would help to alleviate things, but I think you'll find that having several hundred active names will be a better lower limit in your design thinking. Another question is whether names are sent fully-qualified or relative to some directory. My protocol caches directory names in the name cache and allows you to send filenames relative to a cached directory. Just having a way to "chdir" each side (even if the chdir is just virtual) and send names relative to the current directory should help a lot. An additional source of cached names is in the directory scanning when doing a recursive transfer. My protocol has specific commands that refer to a name index within a specified directory so that the receiving side can request changed files using a small binary value instead of a full pathname. One more area of complexity that you don't mention (and I don't either in my new-protocol doc): there are some operations where 2 names need to be associated with one operation. This happens when we have both a destination file and a basis file. My current cache implementation allows both of these names to be associated with a single cache element (though I need to improve this a bit in rzync) and lets the sig/patch stage snag them both. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Rsync --delete does not work
On Tue, 23 Jul 2002, g dm wrote: > rsync -a --delete * /data/exp_dir > So, what did I do wrong? You're sending a list of files, not a directory (since '*' is expanded by the shell into a list of files). The --delete option only works on a directory-to-directory transfer, so try using this instead: rsync -a --delete ./ /data/exp_dir ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync: --delete fails with multiple source directories
On Mon, 22 Jul 2002, Edward Farrar wrote: > Rsync 2.5.5 is producing this error message and a core file when executing the > command "/usr/local/bin/rsync -av --delete --force /net/OSCM/OS_ATLAS2/CONFIG/. > /net/OSCM/OS_TITAN1/2.6/CONFIG/. /OS/2.6/CONFIG" > > building file list ... done > rsync: connection unexpectedly closed (8 bytes read so far) > rsync error: error in rsync protocol data stream (code 12) at io.c(150) I looked at the code in flist_find(), and I had the theory that the code would fail if it found a duplicate name as the last item in the flist. Sure enough, creating two directories with one duplicate between them would crash in the same way if that duplicated item is the last one alphabetically but would succeed otherwise. The problems stems from the flist_up() function marching right off the top of the list if the last item has its basename zeroed out (indicating it is a duplicate). The easiest fix appears to be to simply trim the high value to ignore removed items. Like so: Index: flist.c --- flist.c 11 Apr 2002 02:21:41 - 1.124 +++ flist.c 27 Jul 2002 17:40:10 - @@ -1151,7 +1151,9 @@ { int low = 0, high = flist->count - 1; - if (flist->count <= 0) + while (high >= 0 && !flist->files[high]->basename) high--; + + if (high < 0) return -1; while (low != high) { I've tested it and it fixes my crashing testcase, so I'll commit this to CVS. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Pipelined rsync proposal (was Re: superlifter design notes)
On Sun, 21 Jul 2002, jw schultz wrote: > What i am seeing is a Multi-stage pipeline. This is quite an interesting design idea. Let me comment on a few things that I've been mulling over since first reading it: One thing you don't discuss in your data flow is auxiliary data flow. For instance, error messages need to go somewhere (perhaps mixed into the main data flow), and they need to get back to the side where the user resides. This can add an extra network transfer after the update stage (6) to send errors back to the user (if the user is not on the same side as stage 6). Another open issue is what we do when a file changes while we're transferring it. Rsync sends a "redo" request to the generator process and it reruns all changed files at the end of the run. If such a thing is desirable in this utility (instead of just warning the user that the file was unable to be updated), then this "redo" data flow also needs to be mapped out. If this protocol remains more batch oriented, then it probably won't need to redo files -- just warn the user. One of the really nice features of your design is that it is easy to interrupt the flow of data at any point and continue it later. This is a useful thing if the cached information remains valid and thus saves us time/resources on either the next run or on multiple updates to different destination systems. One downside to your protocol is that it requires several socket connections between systems. This either mandates using multiple rsh/ssh connections (possibly with multiple password prompts for a single transfer) OR using some kind of socket-forwarding protocol (such as the one provided by ssh). When I proposed adding extra sockets to the rsync protocol a while back, at least one fellow mentioned that a requirement of using ssh would not be an acceptable solution to him, so this area could be a little controversial (depending on what kind of a solution we can come up with). Another question is whether we need to support the bi-directional transfer of files in a single connection. My rZync test app supports sending files in both directions just because it was so simple to add -- having a message-based protocol makes this a breeze. Your first protocol (the one without any backchannels) looks like it would be a snap to setup using separate processes. It does, as you note, add quite a bit of extra data transmission (such as an extra 2x hit in filename transfer alone). The backchannels add some complicating factors to the file I/O that will need to be carefully designed to avoid deadlocks. Since the data is strictly ordered with one chunk for pipe-A and one chunk for pipe-B (for each file), the code should be fairly straight-forward, though, so hopefully this won't be a big problem. Caching off data from the backchannel utility might be pretty complex, though -- think about interrupting the stream after step 3, you'd need to buffer off the backchannel data from step 1 plus the main output and backchannel data from step 3 and then restart things at steps 4 and 5 with the appropriate main-stream input and backchannel flows. That would be much harder than saving off the one single output flow from step 3 and starting up step 4 later on using it, so either the backchannel algorithm may not be very useful in a batch scenario, or we'd need to have a helper script that can figure out how to interrupt and restart the chain of processes at any point. I find your idea to allow the first 4 steps of the scan/compare/checksum sequence to be reversed intriguing. At first I thought that it would be too fragile since the server's data tends to be updating constantly (and this protocol needs to have the server data remain constant from the moment the checksum blocks are created until the client(s) all fetch the updated data). However, I can see that this may well be a really nice way to update an archive and let multiple (non-identical) clients request updates. This will require an extension to librsync that would allow a reversed rolling-checksum diff option, and an option to separate the diff and transmit stages (which are currently done at the same time), so this idea has a bigger overhead than the rest of the tool as far as the rsync protocol is concerned. The most efficient multi-server duplication process would be to save off the output of the transmit phase and send it to multiple systems for just the final update phase. This does require that the destination machines all have identical file trees for the updating to work, though, so this only works on tightly-controlled mirrors. The advantage is that the server expends no further resources than to just get the update stream transmitted to the clients (who can duplicate the stream without the server's help). Since your proposed protocol seems to fit so well with batch-oriented scenarios while potentially having problems in the more interactive scenarios, I'm wondering if this should be a separate uti
Re: Patch to update the included popt to 1.6.4
On Thu, 11 Jul 2002, Jos Backus wrote: > http://www.catnook.com/patches/rsync-popt-1.6.4.patch I went ahead and tested this and then checked it in (since we might as well include the newest popt if we're going to include popt with rsync). > The configure script had to be regenerated (with autoconf 2.53) > because popt.c wants HAVE_FLOAT_H. As an aside, I have heard people > complain about this version of autoconf generating scripts that break > when run under bash (as /bin/sh). If this is a concern, I could easily check in a configure/config.h.in that was generated with autoconf 2.52d. Let me know if there are problems (I didn't have any on my Mandrake Linux 8.2 system). ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Useless option combos (was Re: --password-file switch)
On Tue, 30 Jul 2002, Martin Pool wrote: > The --password-file option only applies to rsync daemon connections, > not ssh. Perhaps we should make rsync complain about such options that don't make sense (another example being trying to use -e with a "::" hostspec)? ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: new rsync release needed soon?
> On Wed, Jul 31, 2002 at 10:21:49AM +1000, Martin Pool wrote: > > There's just one more change I would like to put in, which is partially > > rolling back the IPv6 patch so that it uses the old code, unmodified, > > if --disable-ipv6 is specified. There was another patch that I thought was needed with all the timeout problems people have been seeing with large files -- the patch that Stefan Nehlsen sent a few months back. I modified it to work with the latest CVS version, tested it, and checked it in. From my reading of the patch, I think it has a very low chance of screwing anything up. If others disagree, I can back it out and we can put it in later. On Wed, 31 Jul 2002, Dave Dykstra wrote: > The patch that I'd most like to see get in JD Paul's patch for using SSH > and daemon mode together. We still don't have an agreement on what the > syntax should be. I think the combination of -e ssh and :: which he > implemented is the most understandable syntax and we should just go with > it. I'd be glad to check that in and if there is still disagreement over the syntax, we can change it in CVS. I'll look at this next. Talking syntax reminds me of another patch that I think should go in: the one that makes rsync accept rsync:// syntax in the destination, not just the source. Anyone disagree with that? ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: new rsync release needed soon?
On Wed, 31 Jul 2002, Robert Weber wrote: > On the subject of needed patches, I just recently completed a patch for > librsync that fixed the mdfour code to have uint_64 or 2 uint_32's for > size. Without this, the checksums on files >512Megs are incorrect. In order to interoperate with older versions of rsync, wouldn't we need to continue to generate the incorrect checksums on all but the newest (freshly bumped up) protocol number? ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
daemon-server via SSH (was Re: new rsync release needed soon?)
On Wed, 31 Jul 2002, Dave Dykstra wrote: > The patch that I'd most like to see get in JD Paul's patch for using > SSH and daemon mode together. I've completed my mods to get this updated to the latest CVS version and then checked it all in. Since things had changed quite a bit, I applied the patch by hand and then compared my changes to the original patch to ensure that I did a good job. I did leave out one thing that I had a question about in main.c: the code that was looking for a -l option in the remote-shell command. If the user specifies a username in both the host-spec and in their ssh command, do we really want to silently eliminate one of them? Or should we maybe complain and fail? I think I might prefer to let the remote- shell command run and let it complain about the two -l options (if that's what it wants to do), but I could be convinced otherwise. I've tested normal rsync operations to ensure that it is still working right, but not daemon mode (which I don't normally use). If someone could help out with the testing, I'd appreciate it. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: daemon-server via SSH (was Re: new rsync release needed soon?)
On Thu, 1 Aug 2002, Dave Dykstra wrote: > I think the way JD did it was the compromise we agreed on: if a userid > is specified only with userid@hostname, it should be used for both > purposes, but if the -e command includes -l it should override the > login userid only. OK, that makes sense. I'm sorry I missed that. I've committed the code I had ommitted that implements this. As for your SSH_CLIENT change, it doesn't compile on my Linux system with INET6 defined (due to the IPv6 structures having different names). I needed to make this patch to get it to compile: Index: clientname.c --- clientname.c2002/08/01 19:17:00 1.9 +++ clientname.c2002/08/01 21:05:53 @@ -112,8 +111,13 @@ socklen_t sin_len = sizeof sin; memset(&sin, 0, sin_len); +#ifdef INET6 + sin.sin6_family = af; + inet_pton(af, client_addr(fd), &sin.sin6_addr.s6_addr); +#else sin.sin_family = af; inet_pton(af, client_addr(fd), &sin.sin_addr.s_addr); +#endif if (!lookup_name(fd, (struct sockaddr_storage *)&sin, sin_len, name_buf, sizeof name_buf, port_buf, sizeof port_buf)) As for your question of how to know when to look at the SSH_CLIENT environment variable, I wonder if the is_a_socket() call that was in the original patch would be enough of a distinguishing factor. Like this: Index: clientname.c --- clientname.c2002/08/01 19:17:00 1.9 +++ clientname.c2002/08/01 21:05:53 @@ -51,8 +51,7 @@ initialised = 1; - ssh_client = getenv("SSH_CLIENT"); - if (ssh_client != NULL) { + if (!is_a_socket(fd) && (ssh_client = getenv("SSH_CLIENT")) != NULL) { strlcpy(addr_buf, ssh_client, sizeof(addr_buf)); /* truncate SSH_CLIENT to just IP address */ p = strchr(addr_buf, ' '); @@ -100,7 +99,7 @@ strcpy(name_buf, default_name); initialised = 1; - if (getenv("SSH_CLIENT") != NULL) { + if (!is_a_socket(fd) && getenv("SSH_CLIENT") != NULL) { /* Look up name of IP address given in $SSH_CLIENT */ #ifdef INET6 int af = AF_INET6; I'll have to look at the code in more detail to know if this works or not. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: daemon-server via SSH (was Re: new rsync release needed soon?)
I just looked over your latest changes and checked in a few minor fixes that I saw: - In client_addr() we now avoid calling getnameinfo() if we've already setup the addr_buf (in the am_server side). - I moved some structures in client_name() so that they remain in scope the entire time that we have pointers that reference them. With most (all?) C compilers this may not have been necessary in this particular case, but I figure it's safer this way. - The dot-counting loop exited before it could count a 4th dot. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: superlifter design notes and a new proposal
On Sun, 4 Aug 2002, Martin Pool wrote: > My first draft was proposing what you might call a "fine-grained" rpc > system, with operations like "list this directory", "delete this > file", "calculate the checksum of this file." I think Wayne's rzync > system was kind of like that too. Your previous proposal sounded quite a bit more fine-grained than what rZync is doing. For instance, it sounded like you would have much more primitive building-block messages and move much of the controlling smarts into something like a python-language scripting layer. While rZync allows ftp-level control (such as "send this file", "send this directory tree", "delete this file", "create this directory") it does this with a small number of higher-level command messages. Rsync, as you know, is a much more modal protocol. It has a strict set of steps that must be specified in order and nothing else. This saves bytes because so much of the protocol is determined by context, but is very limiting. My rZync protocol opens this up by using message numbers for everything that gets sent, but it still keeps some context-oriented "smarts" when transferring files. There is no micro-management of a file transfer from start to finish. The messages cascade from side to side as the sig, delta, patch sequence of events unfold. The most CISC-like message in rZync is the recursive-directory-send message. Using this is very much like starting an entire "rsync -r src/ dest" transfer sequence via a single message. > So the client will send something more or less equivalent to its whole > command line. I think that's a good idea. My rZync app currently operates on each arg independently, but I recently discovered that this makes it incompatible with rsync when merging directories and such. For instance, the command "rsync -r dir1/ dir2/ dir3" merges the file list and removes duplicates before starting the transfer to dir3. rZync currently just transfers the contents of dir1 to dir3 and then transfers the contents of dir2 to dir3. Fortunately, this is not going to be hard to fix. > While staying with that overall approach, we may still be able to make > some improvements in > > - documenting the protocol > > - doing one directory at a time > > - possibly, doing librsync deltas of directories > > - just one process on either end > > - getting rid of interleaved streams on top of TCP > > - sending errors as distinct packets, including a reference to the >file that caused them (if any) > > - handling ACLs, EAs, and other "incidental" things > > - holding the connection open and doing more operations afterwards This is very much in keeping with what I've been fiddling with in rZync (which nearly implements this whole list). I like the simplicity of one process per side, which makes it easy to cache data that will be used later and discard it when it is no longer needed. I got rid of the "multi-IO" idiom of rsync in favor of sending all data via messages and limiting each chunk to 32K to allow other messages to be mixed into the middle of a large file's data-stream (such as verbose output). I think the basic idea of how rZync envisions a new protocol working is a good one -- not so much the specifics of the bytes sent in the message-header format, but how the messages flow, how each side handles the messages in a single process, how all I/O is handled by a single function, etc. There's certainly lots of room for improvement, though. This also reminds me that I hadn't responded to jw's question about why I thought his pipelined approach was more conducive to a batch protocol than an interactive protocol. To make the pipelined protocol as efficient as rsync will require the complexity of his backchannel implementation, which I think will be harder to get right than a single-process message-oriented protocol. If every stage is a separate process, it seems less clear how to implement something like an interactive "mkdir" or a "delete" command. (What process handles this? How do we signal that process? Do we need yet another socket path for a control stream in some circumstances?) It also seems to me that the extra processes/threads and socket-channels will make a less portable interactive app than a single select-using interactive app. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: --include option
On Mon, 12 Aug 2002, Leaw, Chern Jian wrote: > # rsync -avz --include-from=files_included /stor/circuit_design/ > mickey.willowglen.com:/stor/circuit_design/ The problem with your command is that it contains include directives but no exclusions, so nothing limits the default operation of sending the entire subdirectory contents. An easier way to go for this specific problem is to ignore includes and specify two source dirs, like this: rsync -avz /stor/circuit_design/{clock_speed,fub_layout} mickey.willowglen.com:/stor/circuit_design/ The above assumes your shell has {} expansion, like bash and zsh. If it does not, just mention both directories separately (without any trailing slash). The trailing slash on the destination isn't required, but it doesn't hurt either, so I left it in. To make things work with your include-using command, you'd need to use something like this in your include file: + /clock_speed + /fub_layout - /* This allows the two directories you want, and excludes everything else in the base directory of the transfer. Since none of the rules apply to files deeper than the base dir, none of them will be excluded. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: How to rsync selective subdirectories
On Mon, 12 Aug 2002, Nitin Agarwal wrote: > I want to rsync all the dates directories but only the "toid" > subdirectory. The easiest thing to do might be to use the -R (--relative) option, like this: rsync -avR /abc/dir/*/toid host:/dest/ This will create the /abc/dir/DATE/toid dirs on the destination side. If the "/dest/" dir begins with "/abc/dir", that part will be skipped. If you don't like the extra subdirs, use an include file like this: + /*/toid - /*/* and a command like this: rsync -avR --include-from=above-file /abc/dir/ host:/dest/ This includes everything in the base dir (by default), and only the toid dirs in the one-level-deep subdirs. All other files are unaffected. So, if the date dirs aren't the only thing in the base dir, you'll need a more complicated include file, like this: + /[1-2][0-9][0-9][0-9][0-1][0-9][0-3][0-9] - /* + /*/toid - /*/* ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
RE: --include option
On Tue, 13 Aug 2002, Leaw, Chern Jian wrote: > I tried your suggestion, but did not work. It still copied the entire > filesystem across to the destination machine. Since you failed to provide the command-line you're using, I can't tell you exactly why your command failed. For instance, if you use a trailing slash on the sending-side directory you'd specify the exclusions differently than if the slash was not there. It's fairly easy to figure out for yourself what your inclusion & exclusion lists should look like by first running the command with the -n option (which tells rsync not to actually copy any files). The names that rsync outputs are the names you need to match (just add a slash to the start of the name). Once you get familiar with rsync you'll be able to predict what these names will be, but until then, using -n lets you ask rsync for the answer. As a rule, all names specified before a slash in the sending filename are eliminated from the name when matched against the include/exclude names. It is also sometimes useful to add an extra -v option to the command to see what is getting included or excluded. Another thing I recommend is that you use a "root slash" with names that don't need to float to any level. For instance, if you just specify "foo" as an exclusion, it will exclude that directory OR file at any point in the tree. Specifying "/foo" (or "/sub/foo") is thus safer since it protects against unintended matching. I also prefer a single combined include/exclude file since it is easier to edit and lets you order the inclusions and exclusions (remember that the first matching pattern is the one that is acted upon, so sometimes order does matter). In a combined file, items that begin with "+ " are always taken to be exclusions, and items that being with "- " are always taken to be exclusions. You can leave off the "+ " in an include file (and the "- " in an excluded file), but I included both for completeness. So, with a file named "myinc" that has these 3 lines in it: + /clock_speed + /fub_layout - /* using this command: rsync -avz --include-from=myinc /stor/circuit_design/ mickey.willowglen.com:/stor/circuit_design does not work for you, then I am misunderstanding something about your setup. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Selective sync
On Wed, 14 Aug 2002, Ivan Kovalev wrote: > rootdir/*/2002-08-01 > rootdir/*/*/1-Aug-02 > rootdir/*/2002-08/01 As the documentation states, if you use --exclude=*, you need to include every parent directory on the way down to the directories in question. So, it's easy to see that the rules you gave will never allow the decent into the subdirs needed to find the 2002-08-01 dir because these subdirs get excluded by the "*" before they are ever read. Since the directories you require are not at the same level from the root, you're probably going to need to be pretty specific about what directories to allow leading up to this deeper dir. If we assume that this directory is either in the subdir "foo" or "bar", the following include file would work (with no trailing exclude of "*"): + /*/ - /* + /*/2002-08-01/ + /*/2002-08/ + /*/foo/ + /*/bar/ - /*/* + /*/*/1-Aug-02/ + /*/2002-08/01/ - /*/2002-08/* This may transfer a few extra (empty) subdirs on the way down to the 2002* dirs, but that can only be avoided by getting more specific with the first-level include/exclude directives (like we did with the second level directives). On the flip side, you could replace the two lines that specify second-level dirs (the "foo" and "bar" lines) with a single line that specified "+ /*/*/" if you don't mind having empty 2nd-level dirs that didn't have a 1-Aug-02 dir in them. Note that I prefer using limited exclusions like those above instead of a catch-all --exclude='*' because it makes it easier to include the contents of directories (since the default is to include everything that does not match one of the include/exclude rules). It also avoids improper parsing of a rule like this: + /*/2002-08-01/** This is trying to allow an entire tree of files in a directory one level deep, but it actually gets parsed like this: + /**/2002-08-01/** which can sometimes cause problems. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
RE: --include option
On Wed, 14 Aug 2002, Wayne Davison wrote: > In a combined file, items that begin with "+ " are always taken to be > exclusions Of course, that should have been "inclusions", not "exclusions". ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync --partial produces corrupt data on ctrl-c
On Wed, 28 Aug 2002, Ralf Schreiber wrote: > The partially transfered data (with a dot on first position of the filename) > will be renamed after a ctrl-c occurs (on both > OS) or a window-close (cygwin) to the filename of a fully transfered file > (without the dot), which aren't complete ! Yes, that is the definition of what partial mode does with a partially transferred file. The manual recommends the use of the --compare-dest option to work around this, but that doesn't appear to do the manual says at all, so it looks like there is either a bug in the handling of --compare-dest when --partial is enabled, or a bug in the manual. One thing that the manual doesn't say is what file should be preferred if there is a matching file in both the compare-dest dir and the real- dest dir. This becomes important if we want to use --compare-dest as a holding zone for partial files since we would need to have the code prefer the compare-dest file over the real-dest file if we make it put partially transferred files into the compare-dest dir. The current code prefers the real-dest file over the compare-dest file and puts partially transferred files into the real-dest dir. Opinions? ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: The file name end with . (dot) will be renamed at destinationfolder
On Thu, 5 Sep 2002, Quang Tran Hong wrote: > NormalFile. > 23 (100%) > rename .NormalFile..idNZdb -> NormalFile. : File exists You'll note that in these messages that the dot has not been lost, so it's not rsync's doing that is causing this problem. It looks to be a deficiency in your OS. Are you trying to send this file to a Microsoft OS? ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: exclude option?
On Tue, 17 Sep 2002, Bjorn Graabek wrote: > and here are (currently) the contents of my exclude.txt file: > > --- > + /my documents* > + /favorites* > + /cookies* > + /local settings/application data/microsoft/outlook/outlook.pst > - /* > --- I think the first problem is that you aren't using the right capitalization. Rsync does not ignore case, so it will not match the named directories unless you specify them using the same mixed-case that is returned by a directory list. Another problem is that you don't specify a way to get into the "Local Settings" dir to get down to the outlook.pst file. Finally, the trailing '*' is not needed if you are matching the directory name exactly, only if you are matching more than one directory with each line. I'm assuming you aren't, so here's my suggested solution (you'll have to check if the mixed case is OK): --- + /My Documents + /Favorites + /Cookies + /Local Settings - /* + /Local Settings/Application Data - /Local Settings/* + /Local Settings/Application Data/Microsoft - /Local Settings/Application Data/* + /Local Settings/Application Data/Microsoft/Outlook - /Local Settings/Application Data/Microsoft/* + /Local Settings/Application Data/Microsoft/Outlook/outlook.pst - /Local Settings/Application Data/Microsoft/Outlook/* --- All the extra rules with the "/Local Settings" dir is because I assume there are other files in this hierarchy that you don't want to copy or you just would have said "+ /Local Settings" and left it at that. These lines specify a path that the hierarchical descent through the directories can follow that will get it to the lone file that you want to send, and excludes all other files and directories. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: --delete-after subtleties
On Tue, 1 Oct 2002, Nick Papadonis wrote: > [In 2.5.5] --delete-after [...] must be used with --delete to work. Unfortunately. In the current CVS version, however, --delete-after now implies --delete and the man page mentions this fact. So, this will work more logically whenever 2.5.6 gets released. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Exclude symbolic link to a directory?
On Thu, Oct 10, 2002 at 10:49:33AM -0400, Bryan K. Wright wrote: > The master copy of /local contains the directory "stuff", not > a symbolic link. The problem is, when I rsync /local on the few > machines that have a symbolic link, the link gets nuked and replaced > with a real directory (just like in the master copy). Correct. That's the only thing that rsync currently knows how to do with symlinks to directories -- make them identical with what's on the server. The easiest way around this at the moment is to break up the rsync command into multiple runs, one which excludes all the potential symlink differences, and one for each symlink dir that you want to transfer. > What I've tried is excluding "/local/stuff" and including > "/local/stuff/*", but the stuff symlink still gets nuked. I don't think you were successful at getting the dir excluded, then, or else it would have been untouched, not nuked. Rsync would not have done what you wanted, though, since it has to send the /local/stuff dir to try to send what's inside it (when working recursively). You probably did an exclude of "/local/stuff" rather than "/stuff" (since I assume that the base dir is probably "/local"). So, a solution like this should work: rsync -av --exclude="/stuff" /local/ remote:/local rsync -av /local/stuff/ remote:/local/stuff A better, long-term fix would be to add an option that would allow certain symlinks to be treated as a directory. To do this, we need to work out a good heuristic on how to differentiate which is which. I imagine using the following rules (when the new option is enabled): - If a symlink points inside the hierarchy being transferred, treat it as a normal symlink to duplicate (rsync already has code to determine this for its "safe symlink" handling). - If a symlink points outside the transfer AND it points to a directory, treat it as if it were the actual directory for the transfer (I think that only the delete code would need to know that it wasn't a real directory). How does that sound? It should be fairly easy to implement. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: multiple sessions to same destination
On Thu, Oct 10, 2002 at 04:50:33PM -0400, Bennett Todd wrote: > The rsync opens the target file to read; if some other rsync moves a > new file into place before that, there's no concurrency, this is > pure sequential rsyncs; if it moves the target file into place after > it's been opened, the older copy of the target will still be used by > the process we're looking at, through the open file handle it holds; > the intruding copy won't have any effect. Unfortunately it's a little more complicated than that. There are two processes opening the file, first the generator (that sends the check- sums over to the sender) and then the receiver (that opens the file to read matching checksum blocks from the local file). It is possible for the file to change between these two separate file opens, resulting in the creation of a corrupt *temporary* file. Fortunately for us, the whole-file checksum won't match, so rsync won't move the resulting corrupted file into place. It will instead reset its checksum size and try sending the file again. If it fails again, it prints an error and does not update the file. Derek: I'd recommend checking out "unison": http://www.cis.upenn.edu/~bcpierce/unison/ I use this software to keep my rc files in sync between several machines, and it does a wonderful job of merging file changes in both directions. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: HELP !!! Problem with file timestamps updating "weird" during rsync data pull
On Wed, Oct 16, 2002 at 01:36:10PM -0500, Sean O'Neill wrote: > The timestamp should match that of the system the data is pulled from right > ? Well, it doesn't from time to time. The time stamp sometimes gets > updated as just "Oct 16 2002" This is what most unix systems display for a future date. I'm guessing that the clocks on your systems are not in sync -- that the clock on the receiving end is behind the sending end, which causes files that have been recently modified on the sender to show up as having future dates on the receiving system. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Size Discrepancy between source and destination
On Thu, Oct 24, 2002 at 12:37:34PM -0400, Shelley Waltz wrote: > Why is there a difference in the size of the directories for marshall(and > many others) which makes the distination larger than the source? The directory listings you provided show that there are hard-linked files on the source filesystem that are not hard-linked on the destination. Try running rsync with the -H (--hard-link) option. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Fwd: rsync and unlink permission
On Mon, Oct 28, 2002 at 03:12:53PM +0800, Patrick Hsieh wrote: > Since "foo" has no write permission under /var/www, he cannot rsync > from remote server to the local filesystem because rsync will try to > make temp file and unlink the original file before writing over it. Is > there any solution to this problem? See the -T (--temp-dir) option for how to tell rsync to put its temp file in some other directory. If the temp dir is on the same file system as /var/www, rsync will still rename the new file over the top of the old one (which insures that no one can request a partially-written file). If it is on a different file system, rsync will use its copy_file() routine to copy the tmp file over the destination file. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: many files filelist problem
On Tue, Nov 12, 2002 at 03:27:03PM +0200, Mozzi wrote: > [root@ais-mail01 root]# time rsync -pogrve ssh /var/spool/mail > [EMAIL PROTECTED]:/var/spool/mail/ FYI, this command puts the "mail" dir inside /var/spool/mail on the destination. You should add a trailing slash to the source path to avoid this (or remove the "mail/" from the destination path). ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Speed problem
On Tue, Nov 12, 2002 at 04:32:31PM +0100, [EMAIL PROTECTED] wrote: > I'd call it a bug. No, it's not a bug. It's the heart of the rsync algorithm at work. Rsync trades CPU and local file I/O for network I/O in order to reduce the amount of data that is transferred over the network. Your diagnosis has just shown that when the network I/O dips, rsync has traded it for local I/O (grabbing matching blocks from the current file instead of asking for it to be sent over the network). For really large files that have most matching data, most of the file I/O in building a new file will not be network I/O, so it is to be expected that the data rate over the net will drop when that occurs. Note also that the --partial flag is only incidentally related to what you were seeing since it ensured that the destination file had lots of matching data whenever you interrupted the transfer. The only alternative is to use the --whole-file option -- this option turns off the rsync algorithm and just sends all the changed files over the net completely (like an scp copy, but for changed files). This should only be used if you have a really fast network connection OR if you don't want to trade the CPU and local I/O for network I/O. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Speed problem
On Tue, Nov 12, 2002 at 11:30:28PM +0100, [EMAIL PROTECTED] wrote: > And why it tries to get 100% CPU even though there's nothing to do ? What do you mean "nothing to do"? Rsync is creating the new version of a changed file which is done both by transferring data over the network and by copying matching data from the existing version of the file. Just because nothing is being transferred over the link doesn't mean nothing is going on. Or is there some other problem that I missed in this discussion? > Ok, that I never tried because I thought the --partial option should > have been the fastest method because lots of data is still on the other > side if an error has occured before. The --partial option ensures that if we transferred a lot of data to build a file but didn't finish it, that this data is not just thrown away. However, if we started with an already-existing version of a file that was mostly the same as the new version, it is possible that when rsync is interrupted the current partial file actually contains less matching data in it than the already-existing version, and thus retaining this partial file actually makes the next transfer less efficient. Because of this I only use the --partial option if I'm sending really big *new* files, not updating really big existing files. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: The exclude option of Rsync not work right.
On Tue, Nov 19, 2002 at 11:32:06AM -0600, Lori Anderson wrote: > rsync -av /software/testdir/ --exclude='/software/testdir/test.sql' > landser@serv602:/software/testdir/ Inclusions and exclusions are relative to the base of the transfer. Use a leading '/' if you want to indicate that the inclusion/exclusion is anchored to this base. Like this: rsync -av /software/testdir/ --exclude=/test.sql landser@serv602:/software/testdir/ That will exclude /software/testdir/test.sql. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync and the file's mtime
On Mon, Nov 25, 2002 at 09:30:03AM -0500, Jeff Bearer wrote: > But if the file isn't modified, the modified time shouldn't be updated, By default, rsync uses the time & size on the file to determine if it was updated. Since the source and destination files don't match, rsync transferrs the file, and that updates the mtime. There is no special check to see if the newly created temp-file is identical to the existing file -- the file is just updated. If you use the -c (--checksum) option, rsync will switch to testing the checksums of the files to determine if the file needs to be transferred. This will cause the file not even to be sent unless it's changed, and thus to preserve the destination file's current mtime when it is up-to- date. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: cp(1) -n option for rsync?
On Fri, Dec 06, 2002 at 11:53:08AM -0800, Sander van Zoest wrote: > I would like to be able to use rsync to mirror some directories, but > to explicitly *not* override any files that already exist on the other > side. I believe you're looking for the --ignore-existing option. I'm not sure when it got added, but it's in 2.5.5 at least (and not in 2.4.6). ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync 2.4.6 hanging on HPUX11 only over firewall.
On Mon, Dec 09, 2002 at 01:49:40PM +, rsyncuser wrote: > We are interested in finding out whether the wayne-nohang patches can > be applied to 2.4.6. My older patches for 2.4.6 had got moved aside after they got incorporated into the main distribution. However, I just put them back in their original spot so they can be accessed again. The most important patch was the simplest: http://www.clari.net/~wayne/rsync-nohang1.patch This patch ensures that data coming from the generator to the sender does not overflow and block during the final phase of the transfer on the sending side (but not necessarily at the final file, due to the buffering on the outgoing connection). The current code waited around for the remote process to end without reading the incoming data stream, which was a very bad idea if the -v option was turned on. The second patch fixed a much rarer bug -- one that should only get tickled if a good number of the files fail to transfer correctly on the first try and need to be resent: http://www.clari.net/~wayne/rsync-nohang2.patch An older version of this patch was included in the Red Hat sources for a while, so it was pretty widely tested: http://www.clari.net/~wayne/old/rsync-nohang.patch (Note that this patch contains the "nohang1" patch as well.) The reasoning behind this patch is that there is a data channel from the receiver to the generator that tells it what files to retry. This data channel is left totally unread until all files are handled in pass 1. This means that it can block if enough files need to be resent. My patch keeps this data channel clear by reading it whenever data appears and setting flags on what files to resend during the retry phase. I'm thinking about writing a new patch for the latest rsync that causes these need-to-retry files to be immediately resent by the generator to the sender instead of buffering them (with proper signaling to ensure that retry files get their alternate block-sizes set). Perhaps this solution would finally allow this bug to be put to rest (since it's not yet fixed in the main code). ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync stoped syncing
On Mon, Dec 09, 2002 at 04:36:41PM +0100, Markus Lamers wrote: > rsync -auvxz --delete --exclude-from /root/.rsync/home-daily.exc /home > slave:/ I suspect the home-daily.exc file is at fault. What does it contain? ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: include-exclude patterns
On Tue, Dec 10, 2002 at 09:18:06AM -0500, marco wrote: > I even tried this but it include the whole /var/ folder ! > I just want /var/lib/zope. The solution is that after you include something that is too general, you need to exclude what you don't want. Like this: /etc /var - /* /var/lib - /var/* /var/lib/zope - /var/lib/* Explanation: The inclusion of /var is needed just to get rsync to descend into that directory. At that point, you need to add rules for what to do inside of /var, which is to just descend into lib and exclude everything else. The final two rules tell rsync what to do once inside of /var/lib (include zope, exclude everything else). At that point everything in the zope hierarchy will be included. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: filelist calculation algoritm
On Sat, Jan 04, 2003 at 12:40:05PM -0800, jw schultz wrote: > One specifying subpaths and the other for those having a shared > prefix. I don't see why this is needed. For instance, your example of a shared prefix: > find srcdir | myfilter | rsync --file-list - srcdir destloc would be easily written without any sharing: find srcdir | myfilter | rsync --file-list - . destloc or: find /foo/bar | myfilter | rsync --file-list - / destloc Am I missing something? > doing > rsync --file-list-relative - src dest < file1 > file2 > dir1/file3 > EOL > would actually sync > src/file1 > src/file2 > src/dir1 > src/dir1/file3 > to > dest/file1 > dest/file2 > dest/dir1 > dest/dir1/file3 I think that should only happen if the --relative option is set. Otherwise all 3 files should go directly into "dest". ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: filelist calculation algoritm
On Sat, Jan 04, 2003 at 05:03:02PM -0800, jw schultz wrote: > that would produce destloc/srcdir/ > when you might want a copy of srcdir at destloc instead of > in destloc. Ah yes, I _was_ missing something. However, I still don't think we need to clutter rsync with two types of --file-list options. This is already something that people have to deal with when using the --relative option: how to generate a file list that contains just the path information that we need to be significant. I think that the removal of the undesired prefixes should happen before the list gets to rsync rather than having rsync do it (in your example the user would just chdir into "srcdir" and do the "find" relative to '.'). Here's an alternative to the syntax you suggested. I was thinking that it would be nice to just read filenames from stdin and have them be treated the same way as command-line args. One way to indicate this would be to specify '-' as a name to transfer, which would tell rsync to read filenames from stdin. Like this: rsync -av --relative - destloc http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: filelist calculation algoritm
On Sun, Jan 05, 2003 at 11:55:22AM -0800, jw schultz wrote: > The first problem is this would flatten things unless you used > relative and forced the user's CWD. That would cause considerable > confusion. Really? This is exactly how rsync works now with multiple file names on the command-line, so I don't see this as being any more confusing than what we already have. The rule would be you can specify the files on the command-line or on stdin (if you use '-' as the only source file). Since all names are treated in the same way regardless of where they were specified, everything works the same as it did before, only more names are now supported per invocation. I'm thinking that this way is more flexible since it allows someone to flatten things if that's what they really want to do. > Secondly, how would you do it when the source location is remote? > Many of the users asking for this are doing pulls. I mentioned a protocol change that would send the extra file names to the other side after rsync starts up. Currently the send_files() routine always sends names from the sending side to the receiving side. The new protocol would change that to always send names from the user side to the server side when this option was specified. The user's command would look like this: rsync -avR remote:- /foo/bar The file list would be read from the local (user) side, of course. The remote command being run by rsync would look like this: ssh remote rsync --server --sender -vlogDtprR . - The presence of the '-' as the source would tell us to slurp names instead of send them. Since the file list is exchanged in total before we do any real work, I think this change would actually be really easy to implement. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: [PATCH] Add .svn to the exclude list for --cvs-exclude
On Wed, Jan 08, 2003 at 04:42:58PM -0800, jw schultz wrote: > - "RCS","SCCS","CVS","CVS.adm","RCSLOG","cvslog.*", > + "RCS/", "SCCS/", "CVS/", "CVS.adm", "RCSLOG", "cvslog.*", > Might be worth doing to tighten the patterns. Yes, I'd agree with that. I looked at the code to confirm that the trailing slashes would be interpreted correctly, and then tested a modified version to ensure proper functioning. This is a simple enough change that I went ahead and checked it into CVS. In my version I added the ".svn/" pattern near the other dirs instead of at the end of the list. Thanks, Jon, for the patch. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: Stats
On Thu, Jan 09, 2003 at 07:48:50AM -0600, Max Kipness II wrote: > Total file size: 383219712 bytes > Total transferred file size: 383219712 bytes > Literal data: 3143680 bytes > Matched data: 380076032 bytes > > The total file size is definitely correct, but what I don't understand > is the transfered size. Is rsync reporting that roughly 380mb matches? > It would seem like it to me. But is so, why did it transfer the entire > file? You're thinking of the word transfer in the wrong sense here. Rsync's transferred file size is the total of all file sizes that needed to be updated. However, it doesn't mean that those bytes were sent literally over the wire. That stat is taken care of in the next two lines, which tells you how much data was actually communicated literally (3143680 bytes) and how much data was communicated via matched blocks (380076032 byes). Also, in a set of files where some matched and some didn't, the total file size would have included the entire set of files (including those that were up-to-date). In a set of one, needs-to-be-updated file, the total size will always match the transferred size. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html