Re: rsync exclude/include

2001-11-13 Thread Wayne Davison

On Tue, 13 Nov 2001, Thomas Schweikle wrote:
> I am calling rsync using
> 
> rsync -avz --include-from="include" --exclude-from="exclude"
> ftp3.sourceforge.net::/netbsd/iso iso/

Looks like you didn't copy that command exactly, because rsync would
fail with a syntax error due to the '/' before the netbsd module name.
Also, you're creating an iso dir inside your local iso dir, which is
probably not what you want.  With the include/exclude file Dave gave
you, you'd need to run this command (changing "iso/" into "."):

rsync -avz --include-from=foo ftp3.sourceforge.net::netbsd/iso .

However, I'd suggest one a little simpler:  add a trailing slash to the
root directory you're requesting and you can leave off the references to
it (and put the data wherever you like, even if the directory isn't
named "iso").  You would run this command:

rsync -avz --include-from=foo ftp3.sourceforge.net::netbsd/iso/ myiso

And put this into "foo":

+ /1.5.*/
+ /1.5.*/i386*
- *

You'll note I also used a trailing slash for the directory include since
I don't want any files that match to be included (there are none here,
but it's a good general principle).

..wayne..





Re: rsync copy speed.

2001-10-10 Thread Wayne Davison

On Wed, 10 Oct 2001, Andre Pang wrote:
> ssh is your problem

I believe Hans said that he only uses ssh to startup the samba-using
process going, and then transfers all files "locally" with rsync.  So,
the problem is that samba is doing all the data transfer over the
network instead of rsync.

So Hans, if you're updating files on the destination drive (as opposed
to copying them whole onto an empty drive), a much better solution
would be to startup an rsync server (in read-only mode) on the Win98
machine for the duration of the backup.  This allows rsync to optimize
the data transfer.  If you're copying files into an empty destination
drive, you might try using a recursive ftp grab or maybe using
something like this:

 cd /path
 tar cf - . | gzip | ssh backup '(cd /backup/path; gunzip | tar xpf -)'

I know there are tar and gzip utilities available for Win98.

..wayne..





Moving files revisited

2002-01-23 Thread Wayne Davison

I'd like to revisit the topic of moving files from system to system
using rsync.  I've just updated my patch from its 2.5.0 version to
2.5.1, and I'm curious what people think about getting it integrated
into rsync.

The patch comes in two parts.  The first eliminates a potential hang
condition that can happen if the data channel from the receiver to the
generator gets clogged up.  Since my move-files patch is using this
channel to communicate when a file gets successfully written to disk
(from the receiver to the sender via the generator), it needs to ensure
that this hang cannot happen.  The fix is rather complicated (because
the generator is doing a lot of reading and writing of other data), but
I've been using this patch in production conditions for quite a few
months now and haven't encountered any problems yet.  Here's the nohang
patch:

http://www.clari.net/~wayne/rsync-nohang.patch

The second part of the equation actually adds the --move-files option,
the communication of the receiver back to the sender of which file was
successfully finished, and the actual unlinking of the source file:

http://www.clari.net/~wayne/rsync-move-files.patch

Comments?

..wayne..





Re: Rsync 2.5.2 -v too verbose?

2002-01-30 Thread Wayne Davison

On Wed, 30 Jan 2002, Dave Dykstra wrote:
> Martin has put in the below feature in rsync 2.5.2 for using a shell.  I've
> already had one user complain about it.  I think it would be better at the
> -vv level.

Yes, I agree that -vv would be better.  People use -v primarily to see
what files are getting transferred, and seeing what behind-the-scenes
ssh connection is happening is better reserved for a more verbose output
level.

..wayne..





Re: Moving files revisited

2002-01-30 Thread Wayne Davison

On Wed, 23 Jan 2002, Wayne Davison wrote:
> I'd like to revisit the topic of moving files from system to system
> using rsync.

I'm sad that nobody wanted to talk about --move-files yet, but maybe
this will help things along.  I've adapted the patch files to be based
on the latest CVS source:

http://www.clari.net/~wayne/rsync-nohang.patch
http://www.clari.net/~wayne/rsync-move-files.patch

The version for 2.5.1 was renamed:

http://www.clari.net/~wayne/rsync-2.5.1-nohang.patch
http://www.clari.net/~wayne/rsync-2.5.1-move-files.patch

If anyone has any questions, let me know.

..wayne..





Tweak for add_exclude() -vvv output

2002-01-30 Thread Wayne Davison

Here's an improved version of an old patch that I submitted.  It
improves the -vvv output when using --exclude and --include options:

Index: rsync/exclude.c
--- rsync/exclude.c 23 Jan 2002 04:57:18 -  1.39
+++ rsync/exclude.c 30 Jan 2002 18:35:46 -
@@ -201,9 +201,11 @@
if (!*list || !((*list)[len] = make_exclude(pattern, include)))
out_of_memory("add_exclude");

-   if (verbose > 2)
-   rprintf(FINFO,"add_exclude(%s)\n",pattern);
-
+   if (verbose > 2) {
+   rprintf(FINFO,"add_exclude(%s,%s)\n",pattern,
+ include ? "include" : "exclude");
+   }
+
(*list)[len+1] = NULL;
 }


The old output is confusing because an include and and exclude generated
the same text.  This change causes excludes to be output with ",include"
and excludes to be output with ",exclude".

..wayne..





configure --with-rsh=CMD and default blocking-IO support

2002-01-30 Thread Wayne Davison

A while back I argued for adding a --with-rsh=CMD option to configure
and got some general agreement that it would be a good thing (especially
for systems that don't have rsh at all).  However, the changes were
never integrated into rsync.

This patch adds the --with-rsh=CMD option to configure and modifies
main.c to improve the blocking-IO setting code.  The old code would set
blocking_io to '1' if the string matched either "rsh" or "remsh"
(whichever one was configured into rsync).  The new code has a slightly
modified version of this check (that still works even if RSYNC_RSH isn't
defined to be "rsh"), but it also adds a way to force the blocking-IO
setting (both at configure time and via the RSYNC_RSH environment
variable).  The idiom I chose to use was to prefix the value with '@' to
indicate that blocking-IO should be used, and to prefix it with "@@" to
indicate that blocking-IO should not be used.  This allows the installer
to specify --with-rsh=@@ssh to explicitly specify non-blocking-IO for
ssh (for the paranoid), the user to specify RSYNC_RSH=@/local/bin/rsh to
get blocking-IO when using a path to rsh (which the old code would force
the user to specify the --blocking-io option), and also to be able to
specify --with-rsh=@@rsh to get a non-blocking-IO rsh by default (which
is impossible with the old code without specifying a path).

I've appended the patch to the end.  Don't forget to run autoconf after
applying it.

..wayne..

---8<--8<--8<--8<---cut here--->8-->8-->8-->8---
Index: rsync/config.h.in
--- rsync/config.h.in   15 Jan 2002 09:53:29 -  1.68
+++ rsync/config.h.in   30 Jan 2002 18:45:18 -
@@ -303,6 +303,9 @@
 #undef RETSIGTYPE

 /* */
+#undef RSYNC_RSH
+
+/* */
 #undef RSYNC_PATH

 /* rsync release version */
Index: rsync/configure.in
--- rsync/configure.in  25 Jan 2002 23:19:21 -  1.130
+++ rsync/configure.in  30 Jan 2002 18:45:19 -
@@ -78,6 +78,10 @@
 AC_ARG_WITH(included-popt,
 [  --with-included-poptuse bundled popt library, not from system])

+AC_ARG_WITH(rsh,
+   [  --with-rsh=CMD  set rsh command to CMD (default: \"remsh\" or 
+\"rsh\")],
+   [ AC_DEFINE_UNQUOTED(RSYNC_RSH, "$with_rsh", [ ]) ])
+
 AC_ARG_WITH(rsync-path,
[  --with-rsync-path=PATH  set default --rsync-path to PATH (default: 
\"rsync\")],
[ RSYNC_PATH="$with_rsync_path" ],
Index: rsync/main.c
--- rsync/main.c25 Jan 2002 10:07:41 -  1.138
+++ rsync/main.c30 Jan 2002 18:45:22 -
@@ -209,8 +209,19 @@

server_options(args,&argc);

-
-   if (strcmp(cmd, RSYNC_RSH) == 0) blocking_io = 1;
+   if (*cmd == '@') {
+   if (*++cmd == '@') {
+   cmd++;
+   blocking_io = 0;
+   } else
+   blocking_io = 1;
+   args[0] = cmd;
+   } else if (strcmp(cmd, "rsh") == 0
+#if HAVE_REMSH
+   || strcmp(cmd, "remsh") == 0
+#endif
+   )
+   blocking_io = 1;
}

args[argc++] = ".";
Index: rsync/rsync.h
--- rsync/rsync.h   25 Jan 2002 23:00:21 -  1.121
+++ rsync/rsync.h   30 Jan 2002 18:45:29 -
@@ -85,10 +85,12 @@

 #include "config.h"

+#ifndef RSYNC_RSH
 #if HAVE_REMSH
 #define RSYNC_RSH "remsh"
 #else
 #define RSYNC_RSH "rsh"
+#endif
 #endif

 #include 
---8<--8<--8<--8<---cut here--->8-->8-->8-->8---





Re: Moving files revisited

2002-01-31 Thread Wayne Davison

On Thu, 31 Jan 2002, Dave Dykstra wrote:
> It's up to Martin to decide, but I'm sorry to tell you that I'm opposed to
> a --move-files option.  I think that if somebody wants to do that they
> should do it with an external program after rsync returns a clean exit
> code.  It seems to me that it goes against the purpose of rsync because
> after the files are removed from the sending side there's nothing left to
> sync later.

I use rsync instead of scp to copy all my files from system to system,
even when I'm not going to synchronize anything.  The reason is that it
does so many things the right way that scp doesn't support (e.g. scp
opens a new ssh connection for every file, it has no option to write
data to a temp file outside of the destination dir and move it in place
when complete, it has no include/exclude options, a non-recursive copy
doesn't handle directories as nicely as rsync, etc.).

So, I understand where you're coming from, but I look at rsync as a
general file-copying tool (that is also very efficient at updating
files) rather than just as a tool that keeps files in sync.  I know that
the line has to be drawn somewhere, though, when deciding how much is
too much.

> I see that Tridge liked the idea in general but had some problems with your
> implementation:
>
> http://lists.samba.org/pipermail/rsync/2001-May/004282.html
>
> Have you addressed his concern?

That was my earliest patch from back before I understood the data flow
between all the rsync modules.  It was my work on the move-files option
that prompted me to do all the no-hang work, including the patch that is
required by the move-files option.  You'll find later discussion on the
list where Tridge (I believe) also objected to a buffer that could grow
dynamically.  I then changed my implementation to use a fixed-size
buffer, which is in the current patch.

Here's an overview of what the no-hang patch does, with some move-file
comments as well.

When the receiver process is created, it forks off a generator process
on the same machine with two pipes between them (both flowing from the
receiver to the generator).  The first is an error channel (that is also
used for verbose output) and the second is a redo channel that sends the
numbers of the files that need to be reprocessed.  In the generator, the
first channel is constantly checked for content, even when we're reading
the redo channel or writing out data to the sender (this is necessary to
keep the receiver from blocking trying to send the generator data while
the generator is trying to do something else).  However, the "redo" pipe
is not currently kept clear.  It is assumed that the number of redo
items will fit within a pipe's data buffer.  This assumption is usually
right, but for really large numbers of files it might fill up and cause
rsync to hang.  (Also, my move-files patch uses this channel, so it is
imperative that the redo channel be kept clear for --move-files to work).

My no-hang patch adds an array of flag ints, one for each item in the
list of files that are being sent.  The read process in io.c is then
extended to allow the redo pipe to be monitored, flagging all redo items
that show up into the flag array.  This keeps the channel clear, and
provides a way to regenerate the list of redo items for the generator.
(The move-files patch extends this to flag which items are complete and
can be deleted.)

The only complicating factor is what happens when we actually read the
redo channel's fd in a blocking manner at the end of the run (when we're
waiting for the -1 EOF flag).  While doing this, we need to be reading
the error channel and also continuing to flush the write channel to the
sender.  If we break away from the redo channel work to read or write
something else, that read function might actually read data from the
redo channel as a side-effect of its primary work (and we can't disable
this, since we need to keep the redo channel from blocking while doing
other read/write work).  My solution makes the read process aware of
when it is reading the redo channel, and has it return a -3 when some
side-effect work has already put data into the flag array (instead of
trying to read even more data that may not be there).  Since the
function calling the redo-read knows to look in the array, this results
in all the data being processed properly and in the correct order (note
that the function also keeps track of how many EOF -1 items it has seen,
which is vital to it working properly).

Once the redo channel has been made non-blocking, it is a very simple
matter to add move-files support.  The receiver sends the numbers of all
the files that have been successfully written over to the generator
process, which forwards them back to the sender via the normal (combo)
data channel, and the sender reads these safe-to-delete messages and
unlinks the corresponding file for each one it gets.

..wayne..





Re: Moving files revisited

2002-01-31 Thread Wayne Davison

On Thu, 31 Jan 2002, Dave Dykstra wrote:
> Ouch, is that another byte for every file?  Are there no bits free in
> the "flags" field already in file_struct?

Yes, it is an extra byte per file.  An earlier patch of mine did use
bits in the existing flag word in the current per-file structure,
but since that structure is created before the receiver forks off the
generator, I was thinking that any bit-twiddling of the existing flags
would cause a lot of shared memory between the two processes to cease
being shared (on systems that support copy-on-write forks, such as
Linux).  Thus, I think it would be more memory intensive to use the
existing data structure's flags (but I haven't verified this with actual
memory-size testing).

One solution to this would be to use actual shared memory for the file
structure shared by the receiver & generator.

As for the move-files option, I was thinking that I could write a perl
script that would parse the output of rsync -v and delete files that
were successfully transferred by rsync when they show up in the verbose
output.  If can make that work, the need for my nohang patch isn't as
great, and I could probably come up with a simpler way to keep the redo
channel from filling up (perhaps using a buffer in the receiver process
or looking into how to do some portable shared memory).  Hmm, something
to consider.

..wayne..





Re: rsync-2.5.2 possible buglets

2002-02-01 Thread Wayne Davison

On Fri, 1 Feb 2002, Steve G wrote:
> I don't know if this amounts to much, but did you intend to use a &
> rather than a && at line 739 of flist.c?

Fortunately both items in the "&" expression can only have the value of
1 or 0, so the effect is the same as "&&".  It looks like a typo to me,
though.

..wayne..





Re: Moving files revisited

2002-02-05 Thread Wayne Davison

On Thu, 31 Jan 2002, Wayne Davison wrote:
> As for the move-files option, I was thinking that I could write a perl
> script that would parse the output of rsync -v and delete files that
> were successfully transferred by rsync when they show up in the verbose
> output.

I've been meaning to comment on this idea I had.  This perl script idea
only works when running rsync from the sending system, not when pulling
files, so I'd still prefer to have a real --move-files option.

Anyone have any comments on my --move-files implementation?  The current
patch sends a message back from the receiver to the sender, letting it
know when it is OK to delete a file.  An alternate implementation might
be to add a delete pass to the sender to delete all the files en-mass
after the whole process completes successfully.  I personally prefer the
more incremental approach (especially for moving larger numbers of
files).

..wayne..





Re: configure --with-rsh=CMD and default blocking-IO support

2002-02-06 Thread Wayne Davison

On Wed, 6 Feb 2002, Martin Pool wrote:
> OK, I agree --with-rsh should go in, but I think putting magic
> characters into it is needlessly confusing.  I would feel much better
> about a separate configure option to set the default O_NONBLOCK mode.

The complicating factor then becomes: how does the RSYNC_RSH environment
variable interact with this default O_NONBLOCK mode, and how can the
default blocking be changed via the environment?  I came up with the
magic character idea in order to try to keep things simple (using only
one environment variable instead of trying to keep two different ones in
sync).  I admit that it's quirky, though.

So the obvious alternative is something like this:

export RSYNC_RSH=ssh
export RSYNC_BLOCKING_IO=1

Perhaps a better idiom might be allow RSYNC_RSH to begin with a
command-line option?  If the string begins with "--blocking-io " we
strip it off and twiddle that command-line flag?  If we want to make
this orthogonal we could also add support for the --non-blocking-io
command-line option and allow this string to appear at the start of
the RSYNC_RSH value.  What do you think of something like this?

export RSYNC_RSH='--blocking-io /usr/bin/ssh -l username'
export RSYNC_RSH='--non-blocking-io rsh'

Or can you think of a better way to go?

..wayne..





Re: configure --with-rsh=CMD and default blocking-IO support

2002-02-06 Thread Wayne Davison

On Wed, 6 Feb 2002, Dave Dykstra wrote:
> Of the proposed alternatives, I like this latter the best, changing
> --non-blocking-io to --no-blocking-io.

Cool.  I like that one as well.  Here's an implementation.  This patch
adds the configure option --with(out)-blocking-io and defines a new
variable that gets put into config.h:  DEFAULT_BLOCKING_IO.

The default for configure is just as before:  remsh or rsh gets used
with blocking IO on by default.  If the user specifies --with-rsh=CMD
then the default is --without-blocking-io unless the user also specifies
the --with-blocking-io configure option.

The code in main.c now uses the DEFAULT_BLOCKING_IO value, but only when
we use the default RSYNC_RSH (internal) value.  If the user specifies an
RSYNC_RSH environment variable (or a remote shell via the command-line),
the default is to use non-blocking IO.  (This is a slight change in
behavior if the user had set RSYNC_RSH=rsh in their environment -- is
this acceptable?)

The code now allows the remote shell value to contain a single prefixed
IO-blocking option.  If the string starts with "--" and it has a space
in it, the string must start with "--blocking-io ", "--no-blocking-io ",
or "-- " (the last item allows someone to use a program name that
matches one of our options -- just for completeness).

I also updated the main man page to mention the new RSYNC_RSH syntax,
and also to not talk like rsh is always the default remote shell.  In
the --blocking-io section, it used to say that ssh prefers blocking IO.
I've never used anything but non-blocking IO with ssh, so is this
statement backwards?  I tweaked the statement to say that only some
versions of ssh prefer blocking IO.

Don't forget to run autoconf and autoheader after applying this patch.

..wayne..

---8<--8<--8<--8<---cut here--->8-->8-->8-->8---
Index: rsync/configure.in
--- rsync/configure.in  6 Feb 2002 04:37:09 -   1.131
+++ rsync/configure.in  6 Feb 2002 22:45:04 -
@@ -102,6 +102,23 @@
 fi

 AC_DEFINE_UNQUOTED(RSYNC_RSH, "$RSYNC_RSH", [default -e command])
+
+AC_ARG_WITH(blocking-io,
+   AC_HELP_STRING([--with-blocking-io], [set blocking IO for your remote shell]))
+
+case "$with_blocking_io" in
+'')
+if test x"$with_rsh" != x; then
+   IO=0
+else
+   IO=1
+fi
+;;
+no) IO=0 ;;
+*) IO=1 ;;
+esac
+
+AC_DEFINE_UNQUOTED(DEFAULT_BLOCKING_IO, $IO, [default to blocking IO])

 # arrgh. libc in the current debian stable screws up the largefile
 # stuff, getting byte range locking wrong
Index: rsync/main.c
--- rsync/main.c5 Feb 2002 23:05:32 -   1.139
+++ rsync/main.c6 Feb 2002 22:45:08 -
@@ -178,10 +178,25 @@
extern int read_batch;

if (!read_batch && !local_server) { /* dw -- added read_batch */
+   int def_io = DEFAULT_BLOCKING_IO;
if (!cmd)
cmd = getenv(RSYNC_RSH_ENV);
if (!cmd)
cmd = RSYNC_RSH;
+   else
+   def_io = 0;
+   if (*cmd == '-' && cmd[1] == '-' && (tok = strchr(cmd, ' '))) {
+   if (strncmp(cmd+2, "blocking-io ", 12) == 0)
+   def_io = 1;
+   else if (strncmp(cmd+2, "no-blocking-io ", 15) == 0)
+   def_io = 0;
+   else if (cmd[2] != ' ') {
+   rprintf(FERROR,"Invalid remote-shell-IO option: %s\n",
+   cmd);
+   exit_cleanup(RERR_SYNTAX);
+   }
+   cmd = tok + 1;
+   }
cmd = strdup(cmd);
if (!cmd)
goto oom;
@@ -207,8 +222,8 @@

args[argc++] = rsync_path;

-   if ((blocking_io == -1) && (strcmp(cmd, RSYNC_RSH) == 0))
-   blocking_io = 1;
+   if (blocking_io < 0)
+   blocking_io = def_io;

server_options(args,&argc);

Index: rsync/options.c
--- rsync/options.c 5 Feb 2002 23:05:32 -   1.78
+++ rsync/options.c 6 Feb 2002 22:45:09 -
@@ -206,7 +206,7 @@
   rprintf(F," --no-whole-file turn off --whole-file\n");
   rprintf(F," -x, --one-file-system   don't cross filesystem boundaries\n");
   rprintf(F," -B, --block-size=SIZE   checksum blocking size (default 
%d)\n",BLOCK_SIZE);
-  rprintf(F," -e, --rsh=COMMAND   specify rsh replacement\n");
+  rprintf(F," -e, --rsh=COMMAND   specify the remote shell\n");
   rprintf(F," --rsync-path=PATH   specify path to rsync on the remote 
machine\n");
   rprintf(F," -C, --cvs-exclude   auto ignore files in the same way CVS 
does\n");
   rprintf(F," --existing  only update files that already exist\n");
Index: rsync/rsync.yo
--- rsync/rsync.yo  5 Feb 2002 23:05:33 - 

Re: configure --with-rsh=CMD and default blocking-IO support

2002-02-06 Thread Wayne Davison

On Thu, 7 Feb 2002, Martin Pool wrote:
> A general-purpose RSYNC_OPTS variable would be more tasteful.  I think
> popt makes supporting this fairly straightforward.

That's a nice idea.  One area we'll want to be careful of is how the two
options interact.

For instance, we want to support old scripts that might set RSYNC_RSH
and then run a bunch of rsync commands.  It would be nice to make this
work without conflicting with a user's also-existing RSYNC_OPTS var.  A
potential solution is to ignore RSYNC_OPTS if RSYNC_RSH is set (which
also serves to wean people away from RSYNC_RSH if they want to be able
to set other default options).

Another potential problem area is how to override already-set options.
If someone wants to put -a into their RSYNC_OPTS variable, how can they
then turn it off?  I suppose we could just say that the user gets what
she deserves in such a case.

So, perhaps I'm trying to solve a problem that isn't really all that
important.  Just having the ability to set the default remote shell and
its IO mode might be good enough for most people, and we let the rest
use shell scripts or aliases, like you said.

I could trim down my last patch to avoid the extra RSYNC_RSH parsing if
you'd like to just apply the other part of it.  Or, feel free to tweak
it yourself -- it should be pretty easy.

..wayne..





Re: Deleting files from source after a successful rsync !

2002-02-07 Thread Wayne Davison

On Thu, 7 Feb 2002, Kapoor, Nishikant X wrote:
> I have a few clients who prepare some reports and put it in their
> outgoing/ directory for me to pick up every morning. Is there a way to
> delete those files from their outgoing/ after I fetch them ?

You can use my --move-files patch for this, which also requires my
no-hang patch.  (You'd have to get them to install this updated rsync on
their end as well as on your end.)  If that is not possible, you'd have
to kludge something together that would keep track of what files got
grabbed and run a separate ssh process with a manual rm command.

If you're running 2.5.2, you can apply the most recent versions of my
patches:

http://www.clari.net/~wayne/rsync-nohang.patch
http://www.clari.net/~wayne/rsync-move-files.patch

(Use "patch -p1 http://www.clari.net/~wayne/rsync-2.5.1-nohang.patch
http://www.clari.net/~wayne/rsync-2.5.1-move-files.patch

If you're running an even older rsync, you should be able to hand-patch
the rejected chunks from the 2.5.1 versions.

I use an older rsync with these changes on my production systems (to
move ever-arriving information from box to box), and it works great for
me.

Future versions of rsync will hopefully have some version of the
--move-files option included, though we haven't finished the discussion
of exactly what we want to do for the official release.

..wayne..





Re: problem getting just a single dir !

2002-02-10 Thread Wayne Davison

On Sun, 10 Feb 2002, Nishikant Kapoor wrote:
> I am trying to fetch a single dir using the following command but all I
> get is a empty dir:
>
> rsync -av www.myServer.com::myStuff --include=myDir --exclude=* .

Includes are tricky that way -- you told it to just include the
directory, but you didn't tell it to include anything within the
directory.  This is because --exclude=* excludes everything at every
level that didn't get explicitly mentioned (I'm assuming you protected
the '*' so that it didn't get expanded by the shell).

The easiest way to accomplish what you want is to do to just name the
directory without using the include/exclude options:

rsync -av www.myServer.com::myStuff/myDir .

If you want to use include & exclude, you could do this:

rsync -av www.myServer.com::myStuff --include=/myDir --exclude=/* .

This tells rsync to only exclude the items at the base of the path that
are not myDir, not all items at all levels.  Alternately, you could do
this:

rsync -av www.myServer.com::myStuff --include=/myDir** --exclude=*

Where you explicitly include everything within myDir (the "**" matches
slashes, so it includes subdir content as well, and the initial '/' is
required for it to match the whole path).

..wayne..





Re: Exclude directories

2002-02-14 Thread Wayne Davison

On Wed, 13 Feb 2002, Ian Kettleborough wrote:
> ie:
> /usr/src
> or
> /usr/src/

One thing that totally tripped me up at first is that you don't include
the whole path if you're not starting the transfer from the root of the
filesystem.  For instance:

rsync -av /usr/ foobar:/usr

All your excludes would be relative to /usr/, so you'd use /src/ to
exclude /usr/src/.  If you use verbose mode to see the names that rsync
is sending, the names you must put in your include/exclude items need to
match those (with an added starting slash to anchor the match).

..wayne..





Re: Debian bug #128632 && fork

2002-02-18 Thread Wayne Davison

On Mon, 18 Feb 2002, Martin Pool wrote:
> Why the sleep() call?

Also, why close(fd) twice?

> > +   } else if (pid < 0) {
> > +   rprintf(FERROR, "could not create child process: %s\n",
> > +   strerror(errno));
> > +   close(fd);
> > +   sleep(2);
> > }
> >
> > close(fd);

..wayne..





Re: include exclude help please.

2002-03-19 Thread Wayne Davison

Seems to me that the simplest solution is to name the directory
explicitly:

rsync -a --include "*/" --include "*.tif" --exclude "*" /film/jonah /tmp/film

To accomplish the same thing using includes, you could do this:

rsync -a --include /jonah --include "/jonah/**/" --include "*.tif" \
 --exclude "*" /film/ /tmp/film

If you want to exclude any empty directories that either of these
commands creates, you'll have to be more specific in the directory path
that is allowed to succeed.  I.e., if there's a "foo/bar" path inbetween
jonah and sourceimages, you'd need to do something like this:

rsync -a --include /jonah --include /jonah/foo --include /jonah/foo/bar \
 --include /jonah/foo/bar/sourceimages --include "*.tif" \
 --exclude "*" /film/ /tmp/film

I haven't tested any of these, but they look right to me.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: transferring individual files question, pull vs. push

2002-03-19 Thread Wayne Davison

On Tue, 19 Mar 2002, Jeff Field wrote:
> rsync -e ssh source-box.x.com:/var/qmail/control/file1 \
> source-box.x.com:/var/qmail/control/file2 \
> source-box.x.com:/var/qmail/control/file3 \
> source-box.x.com:/var/qmail/control/file4 \
> /var/qmail/control

You can't have multiple remote-machine specifications, even if they
refer to the same machine.  The only thing you can do is to use
wildcards that get remote-expanded (by the remote shell) or copy
entire directories.  For instance:

rsync -e ssh source-box.x.com:/var/qmail/control/file\? /var/qmail/control

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync 2.5.5 --delete-after option bug

2002-04-25 Thread Wayne Davison

On Thu, 25 Apr 2002, Dave Dykstra wrote:
> I think --delete-after should imply --delete.  Would someone like to
> work up the simple patch to the code and the man page?

Sure.  Here's one (note that the OPT_DELETE_AFTER enum was already
defined for some reason).

..wayne..

---8<--8<--8<--8<---cut here--->8-->8-->8-->8---
Index: options.c
--- options.c   16 Apr 2002 01:38:21 -  1.92
+++ options.c   25 Apr 2002 21:57:48 -
@@ -306,7 +306,7 @@
   {"delete",   0,  POPT_ARG_NONE,   &delete_mode , 0, 0, 0 },
   {"existing", 0,  POPT_ARG_NONE,   &only_existing , 0, 0, 0 },
   {"ignore-existing",  0,  POPT_ARG_NONE,   &opt_ignore_existing , 0, 0, 0 },
-  {"delete-after", 0,  POPT_ARG_NONE,   &delete_after , 0, 0, 0 },
+  {"delete-after", 0,  POPT_ARG_NONE,   0,  OPT_DELETE_AFTER, 0, 0 },
   {"delete-excluded",  0,  POPT_ARG_NONE,   0,  OPT_DELETE_EXCLUDED, 0, 0 
},
   {"force",0,  POPT_ARG_NONE,   &force_delete , 0, 0, 0 },
   {"numeric-ids",  0,  POPT_ARG_NONE,   &numeric_ids , 0, 0, 0 },
@@ -476,7 +479,12 @@
  * non-default setting. */
modify_window_set = 1;
break;
-
+
+   case OPT_DELETE_AFTER:
+   delete_after = 1;
+   delete_mode = 1;
+   break;
+
case OPT_DELETE_EXCLUDED:
delete_excluded = 1;
delete_mode = 1;
Index: rsync.yo
--- rsync.yo8 Apr 2002 05:30:28 -   1.96
+++ rsync.yo25 Apr 2002 22:01:47 -
@@ -485,11 +485,12 @@
 dit(bf(--delete-excluded)) In addition to deleting the files on the
 receiving side that are not on the sending side, this tells rsync to also
 delete any files on the receiving side that are excluded (see --exclude).
+Implies --delete.

 dit(bf(--delete-after)) By default rsync does file deletions before
 transferring files to try to ensure that there is sufficient space on
 the receiving filesystem. If you want to delete after transferring
-then use the --delete-after switch.
+then use the --delete-after switch. Implies --delete.

 dit(bf(--ignore-errors)) Tells --delete to go ahead and delete files
 even when there are IO errors.
---8<--8<--8<--8<---cut here--->8-->8-->8-->8---


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Updating the docs/help on the default remote shell

2002-05-03 Thread Wayne Davison

Since rsync can now be configured with a different default remote shell
than "rsh", I think the docs should be updated a bit.  Anyone object to
these changes?

(Note that I also fixed the misstatement that ssh prefers blocking IO.)

..wayne..

---8<--8<--8<--8<---cut here--->8-->8-->8-->8---
Index: options.c
--- options.c   2002/05/03 22:59:17 1.93
+++ options.c   2002/05/03 23:28:47
@@ -230,7 +230,7 @@
   rprintf(F," --no-whole-file turn off --whole-file\n");
   rprintf(F," -x, --one-file-system   don't cross filesystem boundaries\n");
   rprintf(F," -B, --block-size=SIZE   checksum blocking size (default 
%d)\n",BLOCK_SIZE);  
-  rprintf(F," -e, --rsh=COMMAND   specify rsh replacement\n");
+  rprintf(F," -e, --rsh=COMMAND   specify the remote shell\n");
   rprintf(F," --rsync-path=PATH   specify path to rsync on the remote 
machine\n");
   rprintf(F," -C, --cvs-exclude   auto ignore files in the same way CVS 
does\n");
   rprintf(F," --existing  only update files that already exist\n");
Index: rsync.yo
--- rsync.yo2002/05/03 22:58:01 1.97
+++ rsync.yo2002/05/03 23:28:48
@@ -77,11 +77,13 @@
 
 See the file README for installation instructions.
 
-Once installed you can use rsync to any machine that you can use rsh
-to.  rsync uses rsh for its communications, unless both the source and
-destination are local.
+Once installed, you can use rsync to any machine that you can access via
+a remote shell (as well as some that you can access using the rsync
+daemon-mode protocol).  For remote transfers, rsync typically uses rsh
+for its communications, but it may have been configured to use a
+different remote shell by default, such as ssh.
 
-You can also specify an alternative to rsh, either by using the -e
+You can also specify any remote shell you like, either by using the -e
 command line option, or by setting the RSYNC_RSH environment variable.
 
 One common substitute is to use ssh, which offers a high degree of
@@ -135,7 +137,7 @@
 
 manpagesection(CONNECTING TO AN RSYNC SERVER)
 
-It is also possible to use rsync without using rsh or ssh as the
+It is also possible to use rsync without a remote shell as the
 transport. In this case you will connect to a remote rsync server
 running on TCP port 873. 
 
@@ -144,7 +146,7 @@
 your web proxy.  Note that your web proxy's configuration must allow
 proxying to port 873.
 
-Using rsync in this way is the same as using it with rsh or ssh except
+Using rsync in this way is the same as using it with a remote shell except
 that:
 
 itemize(
@@ -242,7 +244,7 @@
  --no-whole-file turn off --whole-file
  -x, --one-file-system   don't cross filesystem boundaries
  -B, --block-size=SIZE   checksum blocking size (default 700)
- -e, --rsh=COMMAND   specify rsh replacement
+ -e, --rsh=COMMAND   specify the remote shell to use
  --rsync-path=PATH   specify path to rsync on the remote machine
  -C, --cvs-exclude   auto ignore files in the same way CVS does
  --existing  only update files that already exist
@@ -505,8 +507,8 @@
 
 dit(bf(-e, --rsh=COMMAND)) This option allows you to choose an alternative
 remote shell program to use for communication between the local and
-remote copies of rsync. By default, rsync will use rsh, but you may
-like to instead use ssh because of its high security.
+remote copies of rsync. By default, rsync is typically configured to use
+rsh, but you may like to instead use ssh because of its high security.
 
 You can also choose the remote shell program using the RSYNC_RSH
 environment variable.
@@ -661,7 +663,8 @@
 a remote shell transport.  If -e or --rsh are not specified or are set to
 the default "rsh", this defaults to blocking IO, otherwise it defaults to
 non-blocking IO.  You may find the --blocking-io option is needed for some
-remote shells that can't handle non-blocking IO.  Ssh prefers blocking IO.
+remote shells that can't handle non-blocking IO.  (Note that ssh prefers
+non-blocking IO.)
 
 dit(bf(--no-blocking-io)) Turn off --blocking-io, for use when it is the
 default.
---8<--8<--8<--8<---cut here--->8-->8-->8-->8---


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



A simpler move-files patch

2002-05-04 Thread Wayne Davison

In an effort to get my long-desired move-files functionality into rsync,
I have created a version of my patch that runs as an extra pass at the
end of the processing.  This results in a simpler set of changes to
rsync.

I still think it would be nice to have incremental deletions during
large transfers (as my first patch provides), but acceptance of this
patch would relegate such quibbling to a discussion of future
optimizations.

One thing that this patch does differently than my last one is this:
it removes all synchronized files from the server, even ones that were
already up-to-date.  (I had been meaning to make my previous patch also
include up-to-date files, but hadn't gotten around to it before this.)
As before, directories are not affected.

This patch is for CVS, but the offsets assume that my last patch to
rsync.yo has already been applied.

Let me know what you think.

..wayne..

---8<--8<--8<--8<---cut here--->8-->8-->8-->8---
Index: options.c
--- save/options.c  Sat May  4 11:22:22 2002
+++ options.c   Sat May  4 11:27:17 2002
@@ -86,6 +86,7 @@
 int modify_window=0;
 #endif
 int blocking_io=-1;
+int move_files=0;
 
 
 /** Network address family. **/
@@ -240,6 +241,7 @@
   rprintf(F," --delete-after  delete after transferring, not before\n");
   rprintf(F," --ignore-errors delete even if there are IO errors\n");
   rprintf(F," --max-delete=NUMdon't delete more than NUM files\n");
+  rprintf(F," --move-filesremove the synchronized files from the 
+sending side\n");
   rprintf(F," --partial   keep partially transferred files\n");
   rprintf(F," --force force deletion of directories even if not 
empty\n");
   rprintf(F," --numeric-ids   don't map uid/gid values by user/group 
name\n");
@@ -290,7 +292,7 @@
   OPT_LOG_FORMAT, OPT_PASSWORD_FILE, OPT_SIZE_ONLY, OPT_ADDRESS,
   OPT_DELETE_AFTER, OPT_EXISTING, OPT_MAX_DELETE, OPT_BACKUP_DIR, 
   OPT_IGNORE_ERRORS, OPT_BWLIMIT, OPT_BLOCKING_IO,
-  OPT_NO_BLOCKING_IO, OPT_WHOLE_FILE, OPT_NO_WHOLE_FILE,
+  OPT_NO_BLOCKING_IO, OPT_WHOLE_FILE, OPT_NO_WHOLE_FILE, OPT_MOVE_FILES,
   OPT_MODIFY_WINDOW, OPT_READ_BATCH, OPT_WRITE_BATCH, OPT_IGNORE_EXISTING};
 
 static struct poptOption long_options[] = {
@@ -365,6 +367,7 @@
   {"hard-links",  'H', POPT_ARG_NONE,   &preserve_hard_links , 0, 0, 0 },
   {"read-batch",   0,  POPT_ARG_STRING, &batch_prefix, OPT_READ_BATCH, 0, 0 },
   {"write-batch",  0,  POPT_ARG_STRING, &batch_prefix, OPT_WRITE_BATCH, 0, 0 },
+  {"move-files",   0,  POPT_ARG_NONE,   &move_files, 0, 0, 0 },
 #ifdef INET6
   {0,'4', POPT_ARG_VAL,&default_af_hint,   AF_INET , 0, 0 },
   {0,'6', POPT_ARG_VAL,&default_af_hint,   AF_INET6 , 0, 0 },
@@ -813,6 +816,9 @@
args[ac++] = "--compare-dest";
args[ac++] = compare_dest;
}
+
+   if (move_files)
+   args[ac++] = "--move-files";
 
*argc = ac;
 }
Index: rsync.h
--- rsync.h 2002/04/11 02:18:51 1.131
+++ rsync.h 2002/05/04 19:20:29
@@ -47,6 +47,7 @@
 #define SAME_NAME SAME_DIR
 #define LONG_NAME (1<<6)
 #define SAME_TIME (1<<7)
+#define FLAG_NO_DELETE (1<<8)
 
 /* update this if you make incompatible changes */
 #define PROTOCOL_VERSION 26
Index: rsync.yo
--- save/rsync.yo   Fri May  3 16:35:18 2002
+++ rsync.yoSat May  4 11:53:41 2002
@@ -254,6 +254,7 @@
  --delete-after  delete after transferring, not before
  --ignore-errors delete even if there are IO errors
  --max-delete=NUMdon't delete more than NUM files
+ --move-filesremove the synchronized files from the sending side
  --partial   keep partially transferred files
  --force force deletion of directories even if not empty
  --numeric-ids   don't map uid/gid values by user/group name
@@ -496,6 +497,10 @@
 
 dit(bf(--ignore-errors)) Tells --delete to go ahead and delete files
 even when there are IO errors.
+
+dit(bf(--move-files)) This tells rsync to remove the source files on the
+sending side that are either successfully transferred to the receiving
+side or are already up-to-date (directories are not removed).
 
 dit(bf(--force)) This options tells rsync to delete directories even if
 they are not empty when they are to be replaced by non-directories.  This
Index: sender.c
--- sender.c2002/04/09 06:03:50 1.17
+++ sender.c2002/05/04 19:20:29
@@ -26,6 +26,7 @@
 extern int io_error;
 extern int dry_run;
 extern int am_server;
+extern int move_files;
 
 
 /**
@@ -184,6 +185,7 @@
rprintf(FERROR,"send_files failed to open %s: %s\n",
fname,strerror(errno));
free_sums(s);
+   file->flags |= FLAG_NO_DELETE;
  

Re: Send Password with RSYNC_PASSWORD ore --password-file

2002-05-04 Thread Wayne Davison

On Sat, 4 May 2002, Manfred Gnaedig wrote:
> If i use this
> rsync -varpog -e ssh --stats /home/www/web6
> 217.172.xxx.xxx:/home/www/web6 --password-file=host1.pwd
> the Server is asking me too fore Passwort.

Ssh is asking you for the password.  However, the --password-file option
(as well as the RSYNC_PASSWORD environment variable) only affects
transfers to an rsync daemon, which you are not using (the rsync daemon
syntax requires 2 colons after the hostname).

So, you either need to switch over to using an rsync daemon (and leave
the "-e ssh" option off), or you need to setup ssh so that it doesn't
prompt you for a password (testing it w/o rsync first is easiest).

One way to setup ssh is to enable an RSA authorized key on the server
you're connecting to.  Look for the discussion of the files identity,
identity.pub, and authorized_keys.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Failure to update differing file

2002-05-04 Thread Wayne Davison

On Sat, 4 May 2002, Michael Fischer wrote:
> 1.  If I touched only the corrupted file, so the file times differed,
> then rsync did update the destination file.
> 
> 2.  If I used the --checksum flag, then it updated correctly.
> 
> But just a plain rsync failed to notice that the files were different.

Then it sounds like rsync was behaving exactly as it should.  By default
it just compares the file times and size and omits anything that appears
to be up-to-date by that standard.  The --checksum option tells it to go
a step farther and check if the checksums match before deciding if the
files are really the same (which is extremely slow and not usually
needed, so it's not on by default).  

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Send Password with RSYNC_PASSWORD ore --password-file

2002-05-04 Thread Wayne Davison

On Sat, 4 May 2002, Manfred Gnaedig wrote:
> mkdir 217.172.xxx.xxx/home/www/web10 : No such file or directory (1)

You left out the "::".  Also, the syntax for server mode is slightly
different -- you need to refer to a module name on the server.  So, if
you have an rsync daemon configured and running on your 217.127.* host,
you could use this to see the module names:

rsync 217.172.xxx.xxx::

Check into the rsync.conf man page for how to configure a module, give
it a password, etc.

If you don't want to run an rsync daemon, you need to work on the angle
of getting ssh to let you connect without a password instead.  See the
ssh-keygen man page for the easy way to go.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Prevent infinite recursion in rwrite()

2002-05-06 Thread Wayne Davison

Here's a resend of an old patch that is intended to avoid an infinite
recursion (ending in a stack overflow) of the rwrite() function getting
an error that calls rwrite(), ad naseum.  I've only seen this happen
when one of the sides dies due to a program error -- in that case, the
connection is closed, and when we try to send an error to the other
side and it generates an error, the error generates an error, etc.

My solution is to use a simple static variable as a semaphore.  If we
get back to rwrite() with a non-zero value, we never again try to send
a message over the socket.  This results in the error going out to
stderr.  In the problem case I saw, this resulted in an error message
being displayed on my terminal (2 actually) instead of a weird crash.

..wayne..

---8<--8<--8<--8<---cut here--->8-->8-->8-->8---
Index: log.c
--- log.c   2002/04/08 09:10:50 1.61
+++ log.c   2002/05/07 00:32:30
@@ -215,6 +215,7 @@
 void rwrite(enum logcode code, char *buf, int len)
 {
FILE *f=NULL;
+   static char semaphore = 0;
extern int am_daemon;
extern int am_server;
extern int quiet;
@@ -243,8 +244,11 @@
 * io_multiplex_write can fail if we do not have a multiplexed
 * connection at the moment, in which case we fall through and
 * log locally instead. */
-   if (am_server && io_multiplex_write(code, buf, len)) {
-   return;
+   if (am_server && (!semaphore++)) {
+   int ret = io_multiplex_write(code, buf, len);
+   semaphore--;
+   if (ret)
+   return;
}
 
if (am_daemon) {
---8<--8<--8<--8<---cut here--->8-->8-->8-->8---


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: A simpler move-files patch

2002-05-09 Thread Wayne Davison

On Thu, 9 May 2002, Dave Dykstra wrote:
> Maybe I'm dense, but I don't see how that's any different from turning
> on a flag (with the opposite meaning) at the end.

The reason this makes a difference is that not all the files get into
that code.  Any files that are identical just get skipped over on the
generator side, so the sender never sees them (in that loop).  So, the
sender needs to assume that we can delete all the files in the list
until we're told what files are not identical.

An alternate way to implement this is to modify the generator process to
send a special "this file is identical" sequence when we're in file-
moving mode.  That would allow the sending process to remove identical
files immediately on the sending side, and then we could just mark the
differing files with a "delete me" flag after we finish sending out all
the updates.

Another thought just occurred to me on how to implement this without
resorting to a post-processing pass.  It might be possible to have the
receiver send the "delete me" events over the error message pipe (rather
than the redo pipe), and since the generator already keeps this pipe
unblocked, that would allow the code to work without first fixing the
redo pipe's blockability.  I can check into this.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: wildcards (was Re: a problem I'm having with rsync-4.5.4)

2002-05-09 Thread Wayne Davison

On Thu, 9 May 2002, Dave Dykstra wrote:
> I would say it's definitely too risky for 2.5.6.

What would you say to adding a (simple) loop to the fnmatch() code that
would cause unanchored things like "foo/*/bar" to not be bound to the
start of the filename?  This would make it work in an equivalent way to
the unanchored non-wildcard strings.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: wildcards (was Re: a problem I'm having with rsync-4.5.4)

2002-05-09 Thread Wayne Davison

On Thu, 9 May 2002, Wayne Davison wrote:
> What would you say to adding a (simple) loop to the fnmatch() code that

Just to clarify (since the above is poorly worded) -- I meant adding the
loop to the rsync code that calls fnmatch(), not trying to modify the
fnmatch() code directly.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: wildcards (was Re: a problem I'm having with rsync-4.5.4)

2002-05-09 Thread Wayne Davison

On Thu, 9 May 2002, Dave Dykstra wrote:
> How many times would you have to call fnmatch for every file?

We'd call fnmatch() an extra time for every slash in the path.  However,
the performance hit of this new loop on the pattern "foo/*" would be the
same as using the two patterns "/**/foo/*" & "/foo/*" (_except_ that the
trailing '*' would work right in the first pattern) -- this is because
"**" already has to do a recursive match iteration, and that's kind of
what our new loop would be doing outside of fnmatch() (we'd actually be
doing less recursive calls, since fnmatch() would call itself an extra
time for every character in the path, but our loop would only call for
every character after a slash).

So yes, this is slightly less efficient for unanchored patterns.  It
would make the code work as advertised, though, and any pattern that was
anchored with a leading slash would be entirely unaffected.  On the
downside, it could cause some people who use unanchored patterns as if
they were actually anchored to be surprised by the change in behavior.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: wildcards

2002-05-10 Thread Wayne Davison

On Fri, 10 May 2002, Dave Dykstra wrote:
> If you dynamically created a */*/*/foo/* pattern with the number of */
> to match the current path it would only have to call fnmatch once.

That's assuming the pattern doesn't contain an interior/trailing "**"
(which could only use the try-after-each-slash loop).  Also, there's no
need to tweak the pattern -- it would be the same amount of work to just
figure out where in the filename your prefix of "*/*/*/" represents and
match at that position (since we'd have to count slashes anyway).  We'd
also have to be careful to ensure that there aren't any exceptional
patterns that could lead to problematical positioning.

A useful question at this point would be:  Does the extra complexity
make a big enough difference to be worth it?  With all of the file I/O
going on, I'm wondering if it would even be noticed.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: bug report

2002-05-10 Thread Wayne Davison

On Fri, 10 May 2002, terrell Larson wrote:
> If rsync is directed to copy a directory tree into another machine and
> the target directory does not exist then rsync will not create the
> required path

Dave Dykstra just recently responded to another user that this is the
intended behavior of rsync.  It will create one level of new directory
in the destination, but no more.  You could make the command you cited
above work by specifying "etc" rather than "etc/":

rsync -av --progress -e "ssh -1" /etc $1:/altsync/$HOSTNAME

This will create the $HOSTNAME dir, if needed, but you can't use
anything deeper than one directory in the source path.

The other way to go is to use the --relative option:

rsync -avP --relative -e "ssh -1" /any/path/at/all $1:/altsync/$HOSTNAME

This will create the $HOSTNAME dir and all the /any/path/at/all dirs, as
needed.

> The [option-specifying form] of the -e option is not documented.
> IMHO it should be.

I agree.  I've whipped up the following patch for rsync.yo, which I
will commit to CVS in a moment:

Index: rsync.yo
--- rsync.yo2002/05/09 21:44:46 1.99
+++ rsync.yo2002/05/10 19:47:05
@@ -515,6 +515,13 @@
 remote copies of rsync. Typically, rsync is configured to use rsh by
 default, but you may prefer to use ssh because of its high security.
 
+Feel free to include options in the COMMAND.  For instance:
+
+quote(-e "ssh -1 -l joe")
+
+(Note that ssh users can alternately store off site-specific connect
+options in their .ssh/config file.)
+
 You can also choose the remote shell program using the RSYNC_RSH
 environment variable.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: bug report

2002-05-11 Thread Wayne Davison

On Fri, 10 May 2002, jw schultz wrote:
> Also the example is an odd one.

It doesn't seem odd to me since the -l option is the one that I've used
most in ssh (when I don't use the config file to avoid all options).
The important part of the example is showing how it's quoted, so what's
in it could certainly be tweaked.  I like the addition of your
"presented as a single argument" caveat to the text.

I had added the extra "chattiness" because, even though it is possible
to override ssh via -e, doing this is really a less desirable solution
than using the .ssh/config file.  I thought it might be helpful to point
people at the better solution so they can avoid having to use the -e
option at all.  If others don't like this text, it could be removed.

As for the -1 option, it just forces the ssh1 protocol.  I left it there
since it was the option that started the discussion.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: bug report

2002-05-11 Thread Wayne Davison

OK, I just checked in a change that uses some of your suggested text to
remove a bit of the chattiness.  I also improved the RSYNC_RSH section
to mention the legality of command-line options.  See if you like it
better.

--- rsync.yo2002/05/09 21:44:46 1.99
+++ rsync.yo2002/05/11 08:31:55 1.101
@@ -515,8 +515,16 @@
 remote copies of rsync. Typically, rsync is configured to use rsh by
 default, but you may prefer to use ssh because of its high security.

+Command-line arguments are permitted in COMMAND provided that COMMAND is
+presented to rsync as a single argument.  For example:
+
+quote(-e "ssh -p 2234")
+
+(Note that ssh users can alternately customize site-specific connect
+options in their .ssh/config file.)
+
 You can also choose the remote shell program using the RSYNC_RSH
-environment variable.
+environment variable, which accepts the same range of values as -e.

 See also the --blocking-io option which is affected by this option.

@@ -982,8 +990,8 @@
 more details.

 dit(bf(RSYNC_RSH)) The RSYNC_RSH environment variable allows you to
-override the default shell used as the transport for rsync. This can
-be used instead of the -e option.
+override the default shell used as the transport for rsync.  Command line
+options are permitted after the command name, just as in the -e option.

 dit(bf(RSYNC_PROXY)) The RSYNC_PROXY environment variable allows you to
 redirect your rsync client to use a web proxy when connecting to a

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Problems with the rsync command line syntax for multiple files

2002-05-13 Thread Wayne Davison

On Mon, 13 May 2002, Peter Møller Neergaard wrote:
> types:/<3>tmp/wwwreports-dont-edit > echo *.html
> [...lots of files with colons in them...]

Rsync treats a colon on the commandline as a separator between a machine
name and the filename, so you can't use *.html if it expands to one or
more names that includes a colon UNLESS the name follows something like
a slash, that is illegal as a hostname.  So, try using "./*.html"
instead.

..wayne..


--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: wildcards

2002-05-13 Thread Wayne Davison

On Mon, 13 May 2002, Dave Dykstra wrote:
> I suggest you go ahead and code it in the way you
> think would be simplest and then we can evaluate it more concretely.

OK.  Here's the simple patch.  It optimizes the loop away if the pattern
starts with "**" (since the loop would be superfluous), but otherwise it
just loops over all the slashes in the name when the pattern is an
unanchored path (i.e. contains at least one interior slash).

I'll post another version where I implemented your suggested
optimization in a moment.

..wayne..

---8<--8<--8<--8<---cut here--->8-->8-->8-->8---
Index: exclude.c
--- exclude.c   2002/04/11 02:25:53 1.44
+++ exclude.c   2002/05/13 19:43:43
@@ -66,6 +66,8 @@
}
}
ret->fnmatch_flags = 0;
+   if (strncmp(pattern, "**", 2) == 0)
+   ret->regular_exp = -1;
}
}
 
@@ -110,6 +112,13 @@
if (ex->regular_exp) {
if (fnmatch(pattern, name, ex->fnmatch_flags) == 0) {
return 1;
+   }
+   if (!match_start && !ex->local && ex->regular_exp > 0) {
+   while ((name = strchr(name, '/')) != NULL) {
+   name++;
+   if (fnmatch(pattern, name, ex->fnmatch_flags) == 0)
+   return 1;
+   }
}
} else {
int l1 = strlen(name);
---8<--8<--8<--8<---cut here--->8-->8-->8-->8---


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: wildcards

2002-05-13 Thread Wayne Davison

Here's a more complex version of the wildcard change that attempts to
count slashes in the pattern (if it does not contain "**" anywhere) and
to match at the appropriate level.

In trying to think up patterns where this might mess up, the only thing
I thought of was something like this:

 foo/b[^/]r/baz

My code would mess this up by counting 3 slashes.

This patch is not based on the previous one, but on CVS.

Note that neither this patch nor my previous one makes "**/foo" match
the file matched by "/foo" (which one might expect it to do).  We could
add some extra code to make this happen, if desired.

Optimization note:  I noticed that both this patch and my previous one
were only checking for "**" at the start of the pattern to trigger the
loop-skipping optimization.  I should really change that to check for
any leading "*" because of the code's limitation of treating "*" like
"**" when "**" is on the line somewhere.

..wayne..

---8<--8<--8<--8<---cut here--->8-->8-->8-->8---
Index: exclude.c
--- exclude.c   2002/04/11 02:25:53 1.44
+++ exclude.c   2002/05/13 20:30:33
@@ -35,6 +35,7 @@
 static struct exclude_struct *make_exclude(const char *pattern, int include)
 {
struct exclude_struct *ret;
+   char *cp;
 
ret = (struct exclude_struct *)malloc(sizeof(*ret));
if (!ret) out_of_memory("make_exclude");
@@ -55,7 +56,7 @@
if (!ret->pattern) out_of_memory("make_exclude");
 
if (strpbrk(pattern, "*[?")) {
-   ret->regular_exp = 1;
+   ret->wild_exp = 1;
ret->fnmatch_flags = FNM_PATHNAME;
if (strstr(pattern, "**")) {
static int tested;
@@ -66,6 +67,8 @@
}
}
ret->fnmatch_flags = 0;
+   if (strncmp(pattern, "**", 2) == 0)
+   ret->wild_exp = -1;
}
}
 
@@ -74,9 +77,8 @@
ret->directory = 1;
}
 
-   if (!strchr(ret->pattern,'/')) {
-   ret->local = 1;
-   }
+   for (cp = ret->pattern; (cp = strchr(cp, '/')) != NULL; cp++)
+   ret->slash_cnt++;
 
return ret;
 }
@@ -95,7 +97,7 @@
int match_start=0;
char *pattern = ex->pattern;
 
-   if (ex->local && (p=strrchr(name,'/')))
+   if (!ex->slash_cnt && (p=strrchr(name,'/')))
name = p+1;
 
if (!name[0]) return 0;
@@ -107,9 +109,24 @@
pattern++;
}
 
-   if (ex->regular_exp) {
+   if (ex->wild_exp) {
+   if (!match_start && ex->slash_cnt && ex->fnmatch_flags != 0) {
+   int cnt = ex->slash_cnt + 1;
+   for (p = name + strlen(name) - 1; p >= name; p--) {
+   if (*p == '/' && !--cnt)
+   break;
+   }
+   name = p+1;
+   }
if (fnmatch(pattern, name, ex->fnmatch_flags) == 0) {
return 1;
+   }
+   if (!ex->fnmatch_flags && !match_start && ex->wild_exp > 0) {
+   while ((name = strchr(name, '/')) != NULL) {
+   name++;
+   if (fnmatch(pattern, name, ex->fnmatch_flags) == 0)
+   return 1;
+   }
}
} else {
int l1 = strlen(name);
Index: rsync.h
--- rsync.h 2002/04/11 02:18:51 1.131
+++ rsync.h 2002/05/13 20:30:34
@@ -392,11 +392,11 @@
 
 struct exclude_struct {
char *pattern;
-   int regular_exp;
+   int wild_exp;
int fnmatch_flags;
int include;
int directory;
-   int local;
+   int slash_cnt;
 };
 
 struct stats {
---8<--8<--8<--8<---cut here--->8-->8-->8-->8---


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Status Query - Please respond - Re: Patch to avoid 'Connectionreset by peer' error for rsync on cygwin

2002-05-15 Thread Wayne Davison

Here's an idea which I haven't had a chance to investigate:

Would it be possible to use atexit() to register a call to shutdown()
for cygwin (or a call to a custom function that would call shutdown()
for the appropriate socket fds)?  This should allow cgywin's broken
socket code to get properly cleaned up without having to sprinkle a
bunch of cygwin-specific code all over the source (as long as the
socket fds don't get closed before we start the exit handling).

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Problems getting rsync working...

2002-05-15 Thread Wayne Davison

On Thu, 16 May 2002, Brad wrote:
> The command which is run on the client:
> rsync -avt /var/spool/mail StorageServer::email

Did you either startup an "rsync --daemon" manually on the server or
setup [x]inetd to spawn "rsync --daemon" when someone connects to the
rsync port?  When you use the "::" syntax, there needs to be an rsync
daemon to handle the connection.  The alternative is to use just one ":"
instead of "::" and let ssh handle the connection.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Status Query - Please respond - Re: Patch to avoid 'Connectionreset by peer' error for rsync on cygwin

2002-05-16 Thread Wayne Davison

On Thu, 16 May 2002, Max Bowsher wrote:
> That just moves the shutdown call from where you finish with the fd to
> where you start using the fd - that's got to be less intuitive.

Being more or less intuitive is not the point.  The idea was to have as
little cygwin kludge code as possible.  Thus, we'd just have one call to
atexit() during startup, with the single cleanup function being able to
handle any and all opened sockets, and we're done (if this is even
feasible -- I haven't looked into it).  This was prompted by Martin's
statement that he considers this a cygwin bug -- I was assuming that he
didn't want to make sweeping changes to all the cleanup code in rsync.
Whether he wants to handle this in a more invasive manner is up to him.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Improving the rsync protocol (RE: Rsync dies)

2002-05-17 Thread Wayne Davison

On Fri, 17 May 2002, Allen, John L. wrote:
> In my humble opinion, this problem with rsync growing a huge memory
> footprint when large numbers of files are involved should be #1 on
> the list of things to fix.

I have certainly been interested in working on this issue.  I think it
might be time to implement a new algorithm, one that would let us
correct a number of flaws that have shown up in the current approach.

Toward this end, I've been thinking about adding a 2nd process on the
sending side and hooking things up in a different manner:

The current protocol has one sender process on the sending side, while
the receiving side has both a generator process and a receiver process.
There is only one bi-directional pipe/socket that lets data flow from
the generator to the sender in one direction, and from the sender to the
receiver in the other direction.  The receiver also has a couple pipes
connecting itself to the generator in order to get data to the sender.

I'd suggest changing things so that a (new) scanning process on the
sending side would have a bi-directional link with the generator process
on the receiving side.  This would let both processes descend through
the tree incrementally and simultaneously (working on a single directory
at a time) and figure out what files were different.  The list of files
that needed to be transferred PLUS a list of what files need to be
deleted (if any) would be piped from the scanner process to the sender
process, who would have a bi-directional link to the receiver process
(perhaps using ssh's multi-channel support?).  There would be no link
between the receiver and the generator.

The advantage of this is that the sender and the receiver are really
very simple.  There is a list of file actions that is being received on
stdin by the sending process, and this indicates what files to update
and which files to delete.  (It might even be possible to make sender be
controlled by other programs.)  These programs would not need to know
about exclusion lists, delete options, or any of the more esoteric
options, but would get told things like the timeout settings via the
stdin pipe.  In this scenario, all error messages would get sent to the
sender process, who would output them on stdout (flushed).

The scanner/generator process would be the thing that parses the
commandline, communicates the exclude list to its opposite process, and
figures out exactly what to do.  The scanner would spawn the sender, and
field all the error messages that it generates.  It would then either
output the errors locally or send them over to the generator for output
(depending on whether we're pushing or pulling files).

As for who spawns the receiver, it would be nice if this was done by the
sender (so they could work alone), but an alternative would be to have
the generator spawn the receiver and then then let the receiver hook up
with the sender via the existing ssh connection.

This idea is still in its early stages, so feel free to tell me exactly
where I've missed the boat.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Improving the rsync protocol (RE: Rsync dies)

2002-05-17 Thread Wayne Davison

On Fri, 17 May 2002, Wayne Davison wrote:
> so feel free to tell me exactly where I've missed the boat.

[Replying to myself...  hmmm...]

In my description of the _new_ protocol, my references to a generator
process are not really accurate.  The current generator process is
forked off after the initial file-list session figures out what files
need to be checked for differences, and it then churns out rolling
checksums for the sender process.  The "generator" in my previous
description is really just a receiver-side scanner process (that looks
for files that need to be check-summed).  So, either the new receiver
process would handle the checksum generation itself, or we'd need a 3rd
process on the receiver side to generate the checksum data (and it would
need a pipeline into the sender).

As a first step in investigating this further, I'm looking into librsync
to see if it might be easy to create a simple sender/receiver duo using
this library.  If anyone knows where some decent documentation on
librsync is, please let me know (I'm looking for it now, but the tar
doesn't appear to come with any decent docs).  I was wondering if 
librsync manages to implement the protocol without forking off a 
separate generator process...

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Rsync hanging

2002-05-24 Thread Wayne Davison

On Fri, 24 May 2002, Mike Rogers wrote:
> RSYNC maybe once a day or so will just hang and sit there...

You don't mention what version of rsync you're using.  Version 2.4.6
would often hang when the -v option was used, so if you're using that,
you'd do well to upgrade.

> strace attached to the process produces the following...

The system calls you cite are normal for that process.  It's the loop in
the wait_process() function, which is a normal end-of-run occurrance.
One of the other 2 processes is probably hung up on a read or write, and
you'd have to look at those processes with strace to see what is going
wrong.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Testing a transfer-only rsync tool

2002-06-01 Thread Wayne Davison

I found some time in the past week to work on a simple test app that
would hopefully help to answer a few questions that came up recently:

1. Can a single-process generator+receiver work well?  (Looks good so far,
   but I haven't run any multi-processor timing tests yet.)

2. How easy is it to use librsync?  (Pretty easy.)

3. How small would a transfer-only tool be?  (It's currently around 1400
   lines of C code, not counting the librsync code.  It was around 900
   lines when I first considered releasing a simple working version, but
   it keeps growing as I flesh out the more advanced features.)

4. Should rsync be separated into a scanning tool and a transfer tool?
   Or should it contain both bits but also allow the user to override
   the scanner to fully control what gets transferred?  Or should we
   just try to optimize the current protocol? (No answers yet, but I'm
   leaving toward the 2nd option above.)

My test tool takes in commands on stdin and outputs messages on stdout.
It forks a second process as specified via the commandline (which can be
any command that runs another rsync_xfer, either locally or remotely).
It then allows you to send AND/OR receive any files you specify (as well
as delete files, mkdir directories, etc.).  Keep in mind that this tool
does not attempt to do any of the "scan both systems, looking for files
that differ" task.

The code is still fairly young and while some of it is pretty good,
other bits show signs of being written in haste.  I've tested it on a
small number of scenarios so far, but nothing exhaustive.

Commands accepted by the tool on stdin (* means not yet tested):

cd REMOTE_DIR [LOCAL_DIR]chdir both sides at once
tmpdir REMOTE_PATH [LOCAL_PATH]  where temp-files go
get REMOTE_FILE [LOCAL_FILE [BASIS_FILE]]rsync to the local system
put LOCAL_FILE [REMOTE_FILE [BASIS_FILE]]rsync to the remote system
mvget REMOTE_FILE [LOCAL_FILE [BASIS_FILE]]  get, then delete REMOTE_FILE
mvput LOCAL_FILE [REMOTE_FILE [BASIS_FILE]]  put, then delete LOCAL_FILE
del FILE delete a remote file
ldel FILEdelete a local file
md DIR   create a remote directory
lmd DIR  create a local directory
ln OLDNAME NEWNAME   create a remote hard link
lln OLDNAME NEWNAME  create a local hard link
sln OLDNAME NEWNAME  create a remote symlink
lsln OLDNAME NEWNAME create a local symlink
mkdev NAME NUMBERcreate a remote device*
lmkdev NAME NUMBER   create a local device*
quit quit

Spaces in filenames need to be backslash-quoted, as do backslash
characters.  E.g. get This\ File.txt That\ File.txt

You run the program like this:

rsync_xfer -vv ssh remote.com rsync_xfer -s

This starts up a local rsync_xfer process in double-verbose mode, and
tells it to run the "ssh remote.com rsync_xfer -s" command.  You can
make this latter command anything you like, as long as it starts up an
rsync_xfer with the slave (-s) option.

If you're feeling brave and you'd like to try it out, feel free, but
treat it like the pre-alpha code that it is.  Also keep in mind that
every time you tell the program to switch from get to put, all the
current outstanding get/put jobs must run to completion before any new
jobs start (which can slow down the transfer by reducing the pipe-lining
of data).

Some of the things yet to do:

- Need a way to override the per-file attributes on files we send (it
  currently preserves the attributes on each source file).
- Need a way to specify/set attributes for non-transferred files, new
  directories, and devices.
- It needs a way to output transfer statistics.
- There needs to be a timeout option.
- It needs some of the error-checking to be polished up.
- Some fatal errors might be better as warnings (if done right).
- There's no retry check if the file changes during the send.
- We need to catch SIGPIPE.
- There needs to be better handling of partially-transferred files.
- The code needs to be broken up into multiple files.
- There's no configuration support (it currently compiles on a modern
  Linux system).
- We might want an option that tells us to connect via socket to a
  particular hostname (instead of running a command).
- The code could use some more verbose-output messages.
- It needs more comments.

The code is here:

http://www.clari.net/~wayne/rsync_xfer.c

You need to have librsync installed or available.  You compile the code
as you would expect:

gcc -g -Wall -o rsync_xfer rsync_xfer.c -lrsync

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Rsync'ing lists of files

2002-06-10 Thread Wayne Davison

On Fri, 7 Jun 2002, Stephane Paltani wrote:
> I have 5 million files on one side of the ocean, 10 of which must
> be copied to the other side.

This is the sort of problem that would benefit from the rsync_xfer.c
program I'm working on (I mentioned an early version on the list a week
or so ago).  It allows total control of what gets sent by an external
program, so there's no directory scan and no include/exclude processing.
I could imagine writing a simple perl script that would take a list of
files and turn it into a series of "cput" commands followed by any
needed "del" commands to remove the names that vanished from the list
after the last run.  Unfortunately, the code is still at a very early
stage, so it's not yet ready for use in a production environment.

I've been working on a new version of the program that is able to
transfer trees of files and will also have an improved socket protocol.
It works through the tree incrementally, and thus it shouldn't use as
much memory as the current rsync implementation.  After I get the code
in a little better shape, I'm planning to compare its performance with
the current implementation and try to figure out if rsync might best
benefit from adding support for a new (internal) protocol, or if it just
needs some tweaks to the current one.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: problem and a question

2002-06-11 Thread Wayne Davison

On Tue, 11 Jun 2002, Simison, Matthew wrote:
> > c:Connection refused
> rsh: can't establish connection

Did you used to have an RSYNC_RSH variable in your environment?  Perhaps
one that was set to use ssh?  You could run "echo $RSYNC_RSH" on one of
your Unix boxes to see what they're set to use.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



2nd release of my new-protocol testing app

2002-06-13 Thread Wayne Davison

I've been having a lot of fun improving my new-protocol testing app.
It's seems to be in pretty good shape (for test code), so I figured I'd
announce another release for those brave souls that may want to help me
in my thinking about a (potential) new rsync protocol.  It's a tar.gz
file this time because I broke up the code into multiple files.  I named
it "rzync" just for fun (a very confusing name, no?):

http://www.clari.net/~wayne/rzync.tar.gz

The new stuff in this release is that it can get/put an entire directory
tree of files via getd/putd, and it has conditional get/put commands
that handle both files and directories (cget/cput).  (For those that
missed the first announcement, the program can be totally controlled by
an external application via a simple set of commands on stdin.)

I've included a perl script named "rs" that will take an rsync-like
command line (as long as the destination is a directory and not a file)
and drive rzync with it.  Keep in mind that rzync still has the -a
option hard-wired to on, so "rs -v /path/foo remote:/path" works like
"rsync -av /path/foo remote:/path".

Things I've noticed so far:

- My single-proc generator/receiver seems to perform well when I send
  data over my DSL connection, but it goes much slower than rsync when
  sending data over a local pipe.  I'm guessing that this is because a
  multi-process setup can keep the generator pipeline filled to a
  greater degree.  If this is true, one solution would be to add a
  thread that would be responsible for handling all the generator tasks
  (and perhaps using the GNU portable thread library if we want to be
  compatible with systems that don't support process threads).

- The deltas produced by librsync are sometimes considerably larger than
  those produced by rsync, so the speedup of rzync sometimes suffers
  compared to rsync.  I believe that this is because (even without -z)
  rsync does some compression of the delta data that librsync does not
  do.

- The incremental directory scanning seems to work quite well.  I have
  not fleshed out all the areas that would need to grow dynamically for
  _really_ large jobs, so if someone wants to try to send some huge
  directory trees, we'll have to flesh out some more of the code first.

- My directory-scanning code does not attempt to handle symlinks,
  devices, or named sockets yet (it just skips them).

- Since the directory-scan data is shared between the two sides using
  the rsync algorithm, it has the potential to save a lot of transfer
  bytes when the directories on each side are similar.

Feel free to let me know what you think.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync: error writing NNNN unbuffered bytes - exiting:Connectionreset by peer

2002-06-13 Thread Wayne Davison

On 13 Jun 2002, Bill Geddes wrote:
> Suggestions on how to proceed would be greatly appreciated.

It is possible that one side of the connection is seg-faulting and
dying.  If you ensure that core files are not disabled (check your
ulimit setting), you may find that there is a core file that you could
use to figure out where the program is dying.  Alternately, you could
attach to one or both processes with a debugger after it starts running
(e.g. "gdb /usr/bin/rsync 12345" where "12345" is the already-running
process ID), tell it to 'c'ontinue to run, and you'll see any abnormal
signals that may pop up.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: 2nd release of my new-protocol testing app

2002-06-13 Thread Wayne Davison

On Thu, 13 Jun 2002, Wayne Davison wrote:
> http://www.clari.net/~wayne/rzync.tar.gz

I forgot to mention that I changed the order of the local/remote args
to the 2-arg version of the "cd" command to be "cd LOCAL REMOTE" (the
command "cd DIR" still changes both the local and remote sides).  This
only affects someone who had written a script or an input file to drive
my earlier rsynx_xfer release.  I hope that didn't trip someone up.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Rsync 2.4.6 and Hammerd CPU's

2002-06-18 Thread Wayne Davison

On Mon, 17 Jun 2002, Sandy Ganz wrote:
> Any ideas on how to keep rsync from using all the cpu on the webservers? I

Have you tried running rsync under "nice"?  Start it up on the webserver
side with the same command as before, just put "nice " at the start and
see if that relieves the pressure on your CPU.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Release 3 of "rzync" new-protocol test

2002-06-21 Thread Wayne Davison

For anyone who'd like to check out the latest release of my "rzync" [sic]
test release, I've just released a new version.  For those that might
not have time to look at the code but could provide some feedback based
on a rough description, I've created the following simple web page:

http://www.clari.net/~wayne/new-protocol.html

Here's the tar file of the new release:

http://www.clari.net/~wayne/rzync-0.03.tar.gz

Changes in this version:

I've optimized the protocol to make the transferred-byte overhead
smaller; I've used an rsync-like file-list compression to make the
directory data smaller; I've gotten rid of some previous limitations
(such as the 4-byte file-size limit and the lack of reallocating various
buffers for really large file-count transfers); I've re-enabled the
"move" versions of the various get/put commands (which were disabled in
the last release); and I've fixed several bugs.  The resulting program
seems to be working quite well in my limited testing.

The count of transferred bytes in the latest protocol is now below what
rsync sends for many commands -- both a start-from-scratch update or a
fully-up-to-date update are usually smaller, for instance.  This is
mainly because my file-list data is smaller, but it's also because I
reduced the protocol overhead quite a bit.  Transferred bytes for
partially-changed files are still bigger than rsync because librsync
creates unusually large delta sizes (though there's a patch that makes
it work much better, it's still not as good as rsync).

In my speed testing, one test was sending around 8.5 meg of data on a
local system, and while rsync took only .5 seconds, my rzync app took
around 2 seconds.  A quick gprof run reveals that 98% of the runtime is
being spent in 2 librsync routines, so it looks like librsync needs to
be optimized a bit.

One potential next steps might include optimizing rsync to make the
transferred file-list size a little smaller (e.g. making the transfer of
the "size" attribute only as long as needed to store the number would
save ~4-5 bytes per file entry on typical files).

It looks like work needs to be done on making librsync more efficient.

Until I can get some better speed tests, I'm unsure if I should attempt
to make rsync talk my new protocol.  Opinions welcomed.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



rZync 0.04 -- a faster next-generation protocol test app

2002-06-21 Thread Wayne Davison

FYI, I decided to release a new version of my next-generation protocol
test app because I created an optimized transfer mode when files are
being sent whole (it bypasses all calls to librsync).  This makes my
"rZync" test app faster than rsync for sending whole files (rather than
4x slower, like it was).  This is significant because it helps to assure
me that my single-process generator/receiver will be able to keep up
with rsync's dual process implementation.  A full-file transfer appears
to be faster than rsync, even on a dual processor system.  For instance,
this test was 775 files in 126 directories:

-- rsync --
wrote 32920749 bytes  read 12420 bytes  9409476.86 bytes/sec
total size is 32869747  speedup is 1.00
rsync -av foo /tmp  2.23s user 1.54s system 162% cpu 2.314 total

wrote 32920749 bytes  read 12420 bytes  7318482.00 bytes/sec
total size is 32869747  speedup is 1.00
rsync -av foo /tmp  2.23s user 1.55s system 105% cpu 3.588 total

-- rZync --
wrote 32900189 bytes (16813)  read 5534 bytes (5534)  13162289.20 bytes/sec
total size is 32869700  speedup is 1.00
rs -av foo /tmp  0.34s user 0.56s system 39% cpu 2.274 total

wrote 32900064 bytes (16688)  read 5534 bytes (5534)  13162239.20 bytes/sec
total size is 32869700  speedup is 1.00
rs -av foo /tmp  0.42s user 0.69s system 58% cpu 1.910 total
---

I've also updated my new-protocol web page to explain what I'm trying to
accomplish (which some folks probably missed the first-time around):

http://www.clari.net/~wayne/new-protocol.html

Here's the tar file of the new release:

http://www.clari.net/~wayne/rzync-0.04.tar.gz

For that that want to try this out, use the "rs" perl script to control
rZync in an rsync-like manner (a temporary, test-mode situation), or
control it yourself by sending it commands on stdin.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: RegExpr in ---exclude

2002-06-24 Thread Wayne Davison

On Mon, 24 Jun 2002, J.Strohschnitter wrote:
> is it possible to use regular expressions in the exclude-paramter of rsync ?
> For example:
> 
> rsync --exclude "/path/to/*/[Ff][Oo][Ll][Dd][Ee][Rr]"

That's still a valid match pattern (and a poor regular expression --
"/*" would match zero or more slashes as a regex, so that would have to
turn into "/.*").

What you're trying to specify is probably failing due to one or more of
these problem areas:

- You have to use "**" to match any depth of subdirs in between the path
  and the name parts.  I.e.  "/path/to/**/[Ff][Oo][Ll][Dd][Ee][Rr]".

- Excludes anchor starting at the root of the transfer, not the root of
  the file system.  In other words, if you're sending "/path/*", you'd
  have to leave off the "/path" in the exclusion.

- You might want to leave off the path altogether.  Using just the name
  "[Ff][Oo][Ll][Dd][Ee][Rr]" would exclude that name at any point in the
  tree.  This is like specifying "**/[Ff][Oo][Ll][Dd][Ee][Rr]", but it
  also matches in the root dir of the transfer, and is more efficient.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



RE: RegExpr in ---exclude

2002-06-24 Thread Wayne Davison

On Mon, 24 Jun 2002, Bernard A Badger wrote:
> Just a comment on shell glob usage [...]
> Shell globbing is done before the program is invoked, so
> the shell globs on "--exclude=/path/to/*/[Ff][Oo][Ll][Dd][Ee][Rr]", but
> unless you have a directory "--exclude=", it won't find anything.

Quite so.  Plus, what happens next is shell (and shell-option)
dependent.  Some shells always expand their args, so expanding a
non-matching arg causes the entire string to vanish (a very useful thing
in a script's "for" loop, but not on a command-line).  Other shells
complain about there being "no match" and refuse to run the command (I
have my interactive shell set to do that because it helps guard against
mistyped args).  Just FYI.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Latest rZync release: 0.06

2002-06-26 Thread Wayne Davison

For the small number of people who are checking this out, I released
version 0.05 a couple days ago (and only mentioned it on my new-protocol
web page) followed today by 0.06.  Some highlights of the two releases:

- We handle symlinks now in our recursive synchronization mode.

- Directory scanning is no longer limited to one active directory at a
  time (which was sorely needed when all the directories were up-to-date).

- Improved the "rs" control script, including the addition of the
  ability to specify a different destination name (previously only
  existing destination directories could be specified).

- Added a README with the latest command syntax for controlling rzync.

- Some much-needed cleanup of internal structures.

- Fixed several bugs.

Web resources:

http://www.clari.net/~wayne/rZync-0.06.tar.gz
http://www.clari.net/~wayne/new-protocol.html

There are still unsquashed bugs lurking, so be careful.  For instance, I
tried to copy my .mozilla dir, and the huge Cache hierarchy is currently
giving it grief.  I'll debug this problem next.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Latest rZync release: 0.06

2002-06-26 Thread Wayne Davison

On Wed, 26 Jun 2002, Wayne Davison wrote:
> There are still unsquashed bugs lurking, so be careful.  For instance, I
> tried to copy my .mozilla dir, and the huge Cache hierarchy is currently
> giving it grief.  I'll debug this problem next.

Turned out to be a silly oversight on a realloc of some directory data.
Applying the following patch fixes things right up.

..wayne..

---8<--8<--8<--8<---cut here--->8-->8-->8-->8---
Index: flist.c
--- flist.c 26 Jun 2002 08:45:21 -  1.15
+++ flist.c 26 Jun 2002 17:41:45 -
@@ -52,10 +52,14 @@
}
len = strlen(fn);
if (bp-flist_data + len + 1 > flist_data_size) {
-   int blen = bp - flist_data;
+   uchar *old_data = flist_data;
flist_data_size *= 2;
flist_data = do_realloc(flist_data, flist_data_size);
-   bp = flist_data + blen;
+   if (flist_data != old_data) {
+   for (j = 0; j < cnt; j++)
+   flist_ptrs[j] += flist_data - old_data;
+   bp += flist_data - old_data;
+   }
}
memcpy(bp, fn, len + 1);
flist_ptrs[cnt++] = bp;
@@ -95,8 +99,10 @@
continue; // XXX ignore devices for now!
}
if (bp - compressed_data + PATH_MAX*2 > compressed_data_size) {
+   int blen = bp - compressed_data;
compressed_data_size += 4*1024;
compressed_data = do_realloc(compressed_data, compressed_data_size);
+   bp = compressed_data + blen;
}
len = strlen(fn);
populate_attrs(&attrs, &sb);
---8<--8<--8<--8<---cut here--->8-->8-->8-->8---


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: strip setuid/setgid bits on backup (was Re: small security-relatedrsync extension)

2002-07-10 Thread Wayne Davison

On Mon, 8 Jul 2002, Eric Horst wrote:
> Not to mention, is it a real long-term goal is to redesign rsync to deal
> with large numbers of files by not building the entire file list up front?

That is something that I'm working on with my rZync application.  It
implements a new protocol that can begin transferring files as soon as
the first directory has been transferred and compared.  The program is
not yet ready for someone with millions of files to test, though -- I
need to change the implementation of the name-cache to handle really
large numbers of files.  I have a new design that I'll be coding up in
the next few days.  Once that's done, I hope to get more people to try
the code out and let me know how it performs.

> If rsync is ever rewritten work directory by directory (or whatever)
> building small file lists instead of building the mega filelist then when
> do you run the post-process script?  After each small batch of files?  Or
> store up the disposition list till the end effectively building a huge
> filelist again?

My initial reaction is that it would be best to start a pipe to the
application at the start of the transfer and incrementally put data into
it as you go along.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Patch to update the included popt to 1.6.4

2002-07-12 Thread Wayne Davison

I'm wondering if we shouldn't just remove popt from the rsync source and
just rely on the user to install the popt package on their system prior
to compiling rsync.  Configure already uses the installed popt in
preference to the included popt, so it wouldn't be hard to change this
to not have a popt fallback.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



rZync 0.08 released

2002-07-13 Thread Wayne Davison

I've released the next version of my rZync test app.  You can find a
link to it here:

http://www.clari.net/~wayne/new-protocol.html

You should also snag the referenced librsync source, as some important
bugfixes in librsync are needed to compile rzync.

For those that don't know, rZync is my new-protocol test app that I'm
using to try out some ideas on how to improve the rsync protocol.  It
transfers directory information incrementally, so it should have a much
lower memory overhead than rsync.

The most important change in this release is that I've replaced the
name-cache code with something that will be more robust and should work
great with really large file transfers.  I also changed the command-
line syntax and have it now parse several new options, such as -r, -p,
-t and such (i.e. the previous behavior of -a being hard-wired to "on"
is no longer present) and a few other things.  Another important bug fix
closes a neglected file handle so we don't overflow the open file limit.

** Be sure to use the new "rs" controlling script and not the old one. **

I've tried the code out on a fairly large data set (~4000 files in ~500
directories), but nothing close to some of you million-file folks.  I
would not yet recommend trying rZync in a production environment, but if
you can run some large file-count tests, please let me know how things
go.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: superlifter design notes and rZync feedback

2002-07-18 Thread Wayne Davison

Martin Pool <[EMAIL PROTECTED]> wrote:
> I've put a cleaned-up version of my design notes up here
> http://samba.org/~mbp/superlifter/design-notes.html

I'll start with some feedback on your rzync comments:

Re: rzync's name:  I currently consider the rZync to be a test app to
allow me (and anyone else who wants to fiddle with it) to try out some
ideas in protocol design.  Integrating the ideas from this back into
rsync or into superlifter would be ideal.  If I ever decide to release
my own file transfer utility, I'll name it something useful at that
time (definitely NOT rzync).

Re: rzync's variable-length fields:  Note that my code allows more
variation than just 2 or 4 bytes -- e.g., I size the 8-byte file-size
value to only as many bytes as needed to actually store the length.  I
agree that we should question whether this complexity is needed, but I
don't agree that it is wrong on principal.  There are two areas where
field-sizing is used:  in the directory-info compression (which is very
similar to what rsync does, but with some extra field-sizing thrown in
for good measure), and in the transmission protocol itself:

I still have questions about how best to handle the transfer of
directory info.  I'm thinking that it might be better to remove the
rsync-like downsizing of the data and to use a library like zlib to
remove the huge redundancies in the dir data during its transmission.

In the protocol itself, there are only two variable-size elements that
goes into each message header.  While this increases complexity quite a
bit over a fixed-length message header, it shouldn't be too hard to
automate a test that ensures that the various header combinations
(particularly boundary conditions) encode and decode properly.  I don't
know if this level of message header complexity is actually needed (this
is one of the things that we can use the test app to check out), but if
we decide we want it, I believe we can adequately test it to ensure that
it will not be a sinkhole of latent bugs.

Re: rzync's name cache.  I've revamped it to be a very dependable design
that no longer depends on lock-step synchronization in the expiration of
old items (just in the creation of new items, which is easy to achieve).

Some comments on your registers:

You mention having something like 16 registers to hold names.  I think
you'll find this to be inadequate, but it does depend on exactly how
much you plan to cache names outside of the registers, how much
retransmission of names you consider to be acceptable, and whether you
plan to have a "move mode" where the source file is deleted.

My first test app had no name-cache whatsoever.  It relied on external
commands to drive it, and it sent the source/destination/basis trio of
names from side to side before every step of the file's progress.  While
this was simple, the increased bandwidth necessary to retransmit the
names was not acceptable to me.

If we just register the active items that are currently being sent over
the wire, the name will need to live through the entire sig, delta,
patch, and (optionally) source-side-delete steps.  When the files are
nearly up-to-date, having only 16 of them will, I believe, be overly
restrictive.  Part of the problem is that the buffered data on the
sig-generating side delays the source-side-delete messages quite a bit.
If we had a high-priority delete channel, that would help to alleviate
things, but I think you'll find that having several hundred active names
will be a better lower limit in your design thinking.

Another question is whether names are sent fully-qualified or relative
to some directory.  My protocol caches directory names in the name cache
and allows you to send filenames relative to a cached directory.  Just
having a way to "chdir" each side (even if the chdir is just virtual)
and send names relative to the current directory should help a lot.

An additional source of cached names is in the directory scanning when
doing a recursive transfer.  My protocol has specific commands that
refer to a name index within a specified directory so that the receiving
side can request changed files using a small binary value instead of a
full pathname.

One more area of complexity that you don't mention (and I don't either
in my new-protocol doc):  there are some operations where 2 names need
to be associated with one operation.  This happens when we have both a
destination file and a basis file.  My current cache implementation
allows both of these names to be associated with a single cache element
(though I need to improve this a bit in rzync) and lets the sig/patch
stage snag them both.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Rsync --delete does not work

2002-07-23 Thread Wayne Davison

On Tue, 23 Jul 2002, g dm wrote:
> rsync -a --delete * /data/exp_dir
> So, what did I do wrong?

You're sending a list of files, not a directory (since '*' is expanded
by the shell into a list of files).  The --delete option only works on
a directory-to-directory transfer, so try using this instead:

rsync -a --delete ./ /data/exp_dir

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync: --delete fails with multiple source directories

2002-07-27 Thread Wayne Davison

On Mon, 22 Jul 2002, Edward Farrar wrote:
> Rsync 2.5.5 is producing this error message and a core file when executing the
> command "/usr/local/bin/rsync -av --delete --force /net/OSCM/OS_ATLAS2/CONFIG/.
> /net/OSCM/OS_TITAN1/2.6/CONFIG/. /OS/2.6/CONFIG"
> 
> building file list ... done
> rsync: connection unexpectedly closed (8 bytes read so far)
> rsync error: error in rsync protocol data stream (code 12) at io.c(150)

I looked at the code in flist_find(), and I had the theory that the code
would fail if it found a duplicate name as the last item in the flist.
Sure enough, creating two directories with one duplicate between them
would crash in the same way if that duplicated item is the last one
alphabetically but would succeed otherwise.  The problems stems from the
flist_up() function marching right off the top of the list if the last
item has its basename zeroed out (indicating it is a duplicate).

The easiest fix appears to be to simply trim the high value to ignore
removed items.  Like so:

Index: flist.c
--- flist.c 11 Apr 2002 02:21:41 -  1.124
+++ flist.c 27 Jul 2002 17:40:10 -
@@ -1151,7 +1151,9 @@
 {
int low = 0, high = flist->count - 1;
 
-   if (flist->count <= 0)
+   while (high >= 0 && !flist->files[high]->basename) high--;
+
+   if (high < 0)
return -1;
 
while (low != high) {

I've tested it and it fixes my crashing testcase, so I'll commit this
to CVS.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Pipelined rsync proposal (was Re: superlifter design notes)

2002-07-27 Thread Wayne Davison

On Sun, 21 Jul 2002, jw schultz wrote:
> What i am seeing is a Multi-stage pipeline.

This is quite an interesting design idea.  Let me comment on a few
things that I've been mulling over since first reading it:

One thing you don't discuss in your data flow is auxiliary data flow.
For instance, error messages need to go somewhere (perhaps mixed into
the main data flow), and they need to get back to the side where the
user resides.  This can add an extra network transfer after the update
stage (6) to send errors back to the user (if the user is not on the
same side as stage 6).

Another open issue is what we do when a file changes while we're
transferring it.  Rsync sends a "redo" request to the generator process
and it reruns all changed files at the end of the run.  If such a thing
is desirable in this utility (instead of just warning the user that the
file was unable to be updated), then this "redo" data flow also needs to
be mapped out.  If this protocol remains more batch oriented, then it
probably won't need to redo files -- just warn the user.

One of the really nice features of your design is that it is easy to
interrupt the flow of data at any point and continue it later.  This is
a useful thing if the cached information remains valid and thus saves us
time/resources on either the next run or on multiple updates to
different destination systems.

One downside to your protocol is that it requires several socket
connections between systems.  This either mandates using multiple
rsh/ssh connections (possibly with multiple password prompts for a
single transfer) OR using some kind of socket-forwarding protocol (such
as the one provided by ssh).  When I proposed adding extra sockets to
the rsync protocol a while back, at least one fellow mentioned that a
requirement of using ssh would not be an acceptable solution to him, so
this area could be a little controversial (depending on what kind of a
solution we can come up with).

Another question is whether we need to support the bi-directional
transfer of files in a single connection.  My rZync test app supports
sending files in both directions just because it was so simple to add --
having a message-based protocol makes this a breeze.

Your first protocol (the one without any backchannels) looks like it
would be a snap to setup using separate processes.  It does, as you
note, add quite a bit of extra data transmission (such as an extra 2x
hit in filename transfer alone).  The backchannels add some complicating
factors to the file I/O that will need to be carefully designed to avoid
deadlocks.  Since the data is strictly ordered with one chunk for pipe-A
and one chunk for pipe-B (for each file), the code should be fairly
straight-forward, though, so hopefully this won't be a big problem.

Caching off data from the backchannel utility might be pretty complex,
though -- think about interrupting the stream after step 3, you'd need
to buffer off the backchannel data from step 1 plus the main output and
backchannel data from step 3 and then restart things at steps 4 and 5
with the appropriate main-stream input and backchannel flows.  That
would be much harder than saving off the one single output flow from
step 3 and starting up step 4 later on using it, so either the
backchannel algorithm may not be very useful in a batch scenario, or
we'd need to have a helper script that can figure out how to interrupt
and restart the chain of processes at any point.

I find your idea to allow the first 4 steps of the scan/compare/checksum
sequence to be reversed intriguing.  At first I thought that it would be
too fragile since the server's data tends to be updating constantly (and
this protocol needs to have the server data remain constant from the
moment the checksum blocks are created until the client(s) all fetch the
updated data).  However, I can see that this may well be a really nice
way to update an archive and let multiple (non-identical) clients
request updates.  This will require an extension to librsync that would
allow a reversed rolling-checksum diff option, and an option to separate
the diff and transmit stages (which are currently done at the same
time), so this idea has a bigger overhead than the rest of the tool as
far as the rsync protocol is concerned.

The most efficient multi-server duplication process would be to save off
the output of the transmit phase and send it to multiple systems for
just the final update phase.  This does require that the destination
machines all have identical file trees for the updating to work, though,
so this only works on tightly-controlled mirrors.  The advantage is that
the server expends no further resources than to just get the update
stream transmitted to the clients (who can duplicate the stream without
the server's help).

Since your proposed protocol seems to fit so well with batch-oriented
scenarios while potentially having problems in the more interactive
scenarios, I'm wondering if this should be a separate uti

Re: Patch to update the included popt to 1.6.4

2002-07-27 Thread Wayne Davison

On Thu, 11 Jul 2002, Jos Backus wrote:
> http://www.catnook.com/patches/rsync-popt-1.6.4.patch

I went ahead and tested this and then checked it in (since we might as
well include the newest popt if we're going to include popt with rsync).

> The configure script had to be regenerated (with autoconf 2.53)
> because popt.c wants HAVE_FLOAT_H. As an aside, I have heard people
> complain about this version of autoconf generating scripts that break
> when run under bash (as /bin/sh).

If this is a concern, I could easily check in a configure/config.h.in
that was generated with autoconf 2.52d.  Let me know if there are
problems (I didn't have any on my Mandrake Linux 8.2 system).

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Useless option combos (was Re: --password-file switch)

2002-07-30 Thread Wayne Davison

On Tue, 30 Jul 2002, Martin Pool wrote:
> The --password-file option only applies to rsync daemon connections,
> not ssh.

Perhaps we should make rsync complain about such options that don't make
sense (another example being trying to use -e with a "::" hostspec)?

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: new rsync release needed soon?

2002-07-31 Thread Wayne Davison

> On Wed, Jul 31, 2002 at 10:21:49AM +1000, Martin Pool wrote:
> > There's just one more change I would like to put in, which is partially
> > rolling back the IPv6 patch so that it uses the old code, unmodified,
> > if --disable-ipv6 is specified.

There was another patch that I thought was needed with all the timeout
problems people have been seeing with large files -- the patch that
Stefan Nehlsen sent a few months back.  I modified it to work with the
latest CVS version, tested it, and checked it in.  From my reading of
the patch, I think it has a very low chance of screwing anything up.
If others disagree, I can back it out and we can put it in later.

On Wed, 31 Jul 2002, Dave Dykstra wrote:
> The patch that I'd most like to see get in JD Paul's patch for using SSH
> and daemon mode together.  We still don't have an agreement on what the
> syntax should be.  I think the combination of -e ssh and :: which he
> implemented is the most understandable syntax and we should just go with
> it.

I'd be glad to check that in and if there is still disagreement over the
syntax, we can change it in CVS.  I'll look at this next.

Talking syntax reminds me of another patch that I think should go in:
the one that makes rsync accept rsync:// syntax in the destination, not
just the source.  Anyone disagree with that?

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: new rsync release needed soon?

2002-07-31 Thread Wayne Davison

On Wed, 31 Jul 2002, Robert Weber wrote:
> On the subject of needed patches, I just recently completed a patch for
> librsync that fixed the mdfour code to have uint_64 or 2 uint_32's for
> size.  Without this, the checksums on files >512Megs are incorrect.

In order to interoperate with older versions of rsync, wouldn't we need
to continue to generate the incorrect checksums on all but the newest
(freshly bumped up) protocol number?

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



daemon-server via SSH (was Re: new rsync release needed soon?)

2002-07-31 Thread Wayne Davison

On Wed, 31 Jul 2002, Dave Dykstra wrote:
> The patch that I'd most like to see get in JD Paul's patch for using
> SSH and daemon mode together.

I've completed my mods to get this updated to the latest CVS version and
then checked it all in.  Since things had changed quite a bit, I applied
the patch by hand and then compared my changes to the original patch to
ensure that I did a good job.

I did leave out one thing that I had a question about in main.c:  the
code that was looking for a -l option in the remote-shell command.  If
the user specifies a username in both the host-spec and in their ssh
command, do we really want to silently eliminate one of them?  Or should
we maybe complain and fail?  I think I might prefer to let the remote-
shell command run and let it complain about the two -l options (if
that's what it wants to do), but I could be convinced otherwise.

I've tested normal rsync operations to ensure that it is still working
right, but not daemon mode (which I don't normally use).  If someone
could help out with the testing, I'd appreciate it.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: daemon-server via SSH (was Re: new rsync release needed soon?)

2002-08-01 Thread Wayne Davison

On Thu, 1 Aug 2002, Dave Dykstra wrote:
> I think the way JD did it was the compromise we agreed on: if a userid
> is specified only with userid@hostname, it should be used for both
> purposes, but if the -e command includes -l it should override the
> login userid only.

OK, that makes sense.  I'm sorry I missed that.  I've committed the code
I had ommitted that implements this.

As for your SSH_CLIENT change, it doesn't compile on my Linux system
with INET6 defined (due to the IPv6 structures having different names).
I needed to make this patch to get it to compile:

Index: clientname.c
--- clientname.c2002/08/01 19:17:00 1.9
+++ clientname.c2002/08/01 21:05:53
@@ -112,8 +111,13 @@
socklen_t sin_len = sizeof sin;
 
memset(&sin, 0, sin_len);
+#ifdef INET6
+   sin.sin6_family = af;
+   inet_pton(af, client_addr(fd), &sin.sin6_addr.s6_addr);
+#else
sin.sin_family = af;
inet_pton(af, client_addr(fd), &sin.sin_addr.s_addr);
+#endif
 
if (!lookup_name(fd, (struct sockaddr_storage *)&sin, sin_len, 
name_buf, sizeof name_buf, port_buf, sizeof port_buf))

As for your question of how to know when to look at the SSH_CLIENT
environment variable, I wonder if the is_a_socket() call that was in
the original patch would be enough of a distinguishing factor.  Like
this:

Index: clientname.c
--- clientname.c2002/08/01 19:17:00 1.9
+++ clientname.c2002/08/01 21:05:53
@@ -51,8 +51,7 @@
 
initialised = 1;
 
-   ssh_client = getenv("SSH_CLIENT");
-   if (ssh_client != NULL) {
+   if (!is_a_socket(fd) && (ssh_client = getenv("SSH_CLIENT")) != NULL) {
strlcpy(addr_buf, ssh_client, sizeof(addr_buf));
/* truncate SSH_CLIENT to just IP address */
p = strchr(addr_buf, ' ');
@@ -100,7 +99,7 @@
strcpy(name_buf, default_name);
initialised = 1;
 
-   if (getenv("SSH_CLIENT") != NULL) {
+   if (!is_a_socket(fd) && getenv("SSH_CLIENT") != NULL) {
/* Look up name of IP address given in $SSH_CLIENT */
 #ifdef INET6
int af = AF_INET6;

I'll have to look at the code in more detail to know if this works or
not.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: daemon-server via SSH (was Re: new rsync release needed soon?)

2002-08-02 Thread Wayne Davison

I just looked over your latest changes and checked in a few minor fixes
that I saw:

- In client_addr() we now avoid calling getnameinfo() if we've already
  setup the addr_buf (in the am_server side).

- I moved some structures in client_name() so that they remain in scope
  the entire time that we have pointers that reference them.  With most
  (all?) C compilers this may not have been necessary in this particular
  case, but I figure it's safer this way.

- The dot-counting loop exited before it could count a 4th dot.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: superlifter design notes and a new proposal

2002-08-04 Thread Wayne Davison

On Sun, 4 Aug 2002, Martin Pool wrote:
> My first draft was proposing what you might call a "fine-grained" rpc
> system, with operations like "list this directory", "delete this
> file", "calculate the checksum of this file."  I think Wayne's rzync
> system was kind of like that too.

Your previous proposal sounded quite a bit more fine-grained than what
rZync is doing.  For instance, it sounded like you would have much more
primitive building-block messages and move much of the controlling
smarts into something like a python-language scripting layer.  While
rZync allows ftp-level control (such as "send this file", "send this
directory tree", "delete this file", "create this directory") it does
this with a small number of higher-level command messages.

Rsync, as you know, is a much more modal protocol.  It has a strict set
of steps that must be specified in order and nothing else.  This saves
bytes because so much of the protocol is determined by context, but is
very limiting.

My rZync protocol opens this up by using message numbers for everything
that gets sent, but it still keeps some context-oriented "smarts" when
transferring files.  There is no micro-management of a file transfer
from start to finish.  The messages cascade from side to side as the
sig, delta, patch sequence of events unfold.  The most CISC-like message
in rZync is the recursive-directory-send message.  Using this is very
much like starting an entire "rsync -r src/ dest" transfer sequence via
a single message.

> So the client will send something more or less equivalent to its whole
> command line.

I think that's a good idea.  My rZync app currently operates on each arg
independently, but I recently discovered that this makes it incompatible
with rsync when merging directories and such.  For instance, the command
"rsync -r dir1/ dir2/ dir3" merges the file list and removes duplicates
before starting the transfer to dir3.  rZync currently just transfers
the contents of dir1 to dir3 and then transfers the contents of dir2 to
dir3.  Fortunately, this is not going to be hard to fix.

> While staying with that overall approach, we may still be able to make
> some improvements in
> 
>  - documenting the protocol
> 
>  - doing one directory at a time
> 
>  - possibly, doing librsync deltas of directories
> 
>  - just one process on either end
> 
>  - getting rid of interleaved streams on top of TCP
> 
>  - sending errors as distinct packets, including a reference to the 
>file that caused them (if any)
> 
>  - handling ACLs, EAs, and other "incidental" things
> 
>  - holding the connection open and doing more operations afterwards

This is very much in keeping with what I've been fiddling with in rZync
(which nearly implements this whole list).  I like the simplicity of one
process per side, which makes it easy to cache data that will be used
later and discard it when it is no longer needed.  I got rid of the
"multi-IO" idiom of rsync in favor of sending all data via messages and
limiting each chunk to 32K to allow other messages to be mixed into the
middle of a large file's data-stream (such as verbose output).

I think the basic idea of how rZync envisions a new protocol working is
a good one -- not so much the specifics of the bytes sent in the
message-header format, but how the messages flow, how each side handles
the messages in a single process, how all I/O is handled by a single
function, etc.  There's certainly lots of room for improvement, though.

This also reminds me that I hadn't responded to jw's question about why
I thought his pipelined approach was more conducive to a batch protocol
than an interactive protocol.  To make the pipelined protocol as
efficient as rsync will require the complexity of his backchannel
implementation, which I think will be harder to get right than a
single-process message-oriented protocol.  If every stage is a separate
process, it seems less clear how to implement something like an
interactive "mkdir" or a "delete" command. (What process handles this?
How do we signal that process?  Do we need yet another socket path for a
control stream in some circumstances?)  It also seems to me that the
extra processes/threads and socket-channels will make a less portable
interactive app than a single select-using interactive app.

..wayne..


-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: --include option

2002-08-12 Thread Wayne Davison

On Mon, 12 Aug 2002, Leaw, Chern Jian wrote:
>   # rsync -avz --include-from=files_included  /stor/circuit_design/
> mickey.willowglen.com:/stor/circuit_design/

The problem with your command is that it contains include directives but
no exclusions, so nothing limits the default operation of sending the
entire subdirectory contents.  An easier way to go for this specific
problem is to ignore includes and specify two source dirs, like this:

rsync -avz /stor/circuit_design/{clock_speed,fub_layout}
mickey.willowglen.com:/stor/circuit_design/

The above assumes your shell has {} expansion, like bash and zsh.  If it
does not, just mention both directories separately (without any trailing
slash).  The trailing slash on the destination isn't required, but it 
doesn't hurt either, so I left it in.

To make things work with your include-using command, you'd need to use
something like this in your include file:

+ /clock_speed 
+ /fub_layout
- /*

This allows the two directories you want, and excludes everything else
in the base directory of the transfer.  Since none of the rules apply to
files deeper than the base dir, none of them will be excluded.

..wayne..

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: How to rsync selective subdirectories

2002-08-12 Thread Wayne Davison

On Mon, 12 Aug 2002, Nitin Agarwal wrote:
> I want to rsync all the dates directories but only the "toid"
> subdirectory.

The easiest thing to do might be to use the -R (--relative) option, like
this:

rsync -avR /abc/dir/*/toid host:/dest/

This will create the /abc/dir/DATE/toid dirs on the destination side.
If the "/dest/" dir begins with "/abc/dir", that part will be skipped.

If you don't like the extra subdirs, use an include file like this:

+ /*/toid
- /*/*

and a command like this:

rsync -avR --include-from=above-file /abc/dir/ host:/dest/

This includes everything in the base dir (by default), and only the toid
dirs in the one-level-deep subdirs.  All other files are unaffected.
So, if the date dirs aren't the only thing in the base dir, you'll need
a more complicated include file, like this:

+ /[1-2][0-9][0-9][0-9][0-1][0-9][0-3][0-9]
- /*
+ /*/toid
- /*/*

..wayne..

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



RE: --include option

2002-08-14 Thread Wayne Davison

On Tue, 13 Aug 2002, Leaw, Chern Jian wrote:
> I tried your suggestion, but did not work. It still copied the entire
> filesystem across to the destination machine.

Since you failed to provide the command-line you're using, I can't tell
you exactly why your command failed.  For instance, if you use a
trailing slash on the sending-side directory you'd specify the
exclusions differently than if the slash was not there.

It's fairly easy to figure out for yourself what your inclusion &
exclusion lists should look like by first running the command with the
-n option (which tells rsync not to actually copy any files).  The names
that rsync outputs are the names you need to match (just add a slash to
the start of the name).  Once you get familiar with rsync you'll be able
to predict what these names will be, but until then, using -n lets you
ask rsync for the answer.  As a rule, all names specified before a slash
in the sending filename are eliminated from the name when matched
against the include/exclude names.

It is also sometimes useful to add an extra -v option to the command to
see what is getting included or excluded.

Another thing I recommend is that you use a "root slash" with names that
don't need to float to any level.  For instance, if you just specify
"foo" as an exclusion, it will exclude that directory OR file at any
point in the tree.  Specifying "/foo" (or "/sub/foo") is thus safer
since it protects against unintended matching.

I also prefer a single combined include/exclude file since it is easier
to edit and lets you order the inclusions and exclusions (remember that
the first matching pattern is the one that is acted upon, so sometimes
order does matter).  In a combined file, items that begin with "+ " are
always taken to be exclusions, and items that being with "- " are always
taken to be exclusions.  You can leave off the "+ " in an include file
(and the "- " in an excluded file), but I included both for completeness.

So, with a file named "myinc" that has these 3 lines in it:

+ /clock_speed
+ /fub_layout
- /*

using this command:

rsync -avz --include-from=myinc /stor/circuit_design/
mickey.willowglen.com:/stor/circuit_design

does not work for you, then I am misunderstanding something about your
setup.

..wayne..

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Selective sync

2002-08-14 Thread Wayne Davison

On Wed, 14 Aug 2002, Ivan Kovalev wrote:
> rootdir/*/2002-08-01
> rootdir/*/*/1-Aug-02
> rootdir/*/2002-08/01

As the documentation states, if you use --exclude=*, you need to include
every parent directory on the way down to the directories in question.
So, it's easy to see that the rules you gave will never allow the decent
into the subdirs needed to find the 2002-08-01 dir because these subdirs
get excluded by the "*" before they are ever read.

Since the directories you require are not at the same level from the
root, you're probably going to need to be pretty specific about what
directories to allow leading up to this deeper dir.  If we assume that
this directory is either in the subdir "foo" or "bar", the following
include file would work (with no trailing exclude of "*"):

+ /*/
- /*
+ /*/2002-08-01/
+ /*/2002-08/
+ /*/foo/
+ /*/bar/
- /*/*
+ /*/*/1-Aug-02/
+ /*/2002-08/01/
- /*/2002-08/*

This may transfer a few extra (empty) subdirs on the way down to the
2002* dirs, but that can only be avoided by getting more specific with
the first-level include/exclude directives (like we did with the second
level directives).  On the flip side, you could replace the two lines
that specify second-level dirs (the "foo" and "bar" lines) with a single
line that specified "+ /*/*/" if you don't mind having empty 2nd-level
dirs that didn't have a 1-Aug-02 dir in them.

Note that I prefer using limited exclusions like those above instead of
a catch-all --exclude='*' because it makes it easier to include the
contents of directories (since the default is to include everything that
does not match one of the include/exclude rules).  It also avoids
improper parsing of a rule like this:

+ /*/2002-08-01/**

This is trying to allow an entire tree of files in a directory one level
deep, but it actually gets parsed like this:

+ /**/2002-08-01/**

which can sometimes cause problems.

..wayne..

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



RE: --include option

2002-08-14 Thread Wayne Davison

On Wed, 14 Aug 2002, Wayne Davison wrote:
> In a combined file, items that begin with "+ " are always taken to be
> exclusions

Of course, that should have been "inclusions", not "exclusions".

..wayne..

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync --partial produces corrupt data on ctrl-c

2002-08-28 Thread Wayne Davison

On Wed, 28 Aug 2002, Ralf Schreiber wrote:
> The partially transfered data (with a dot on first position of the filename)
> will be renamed after a ctrl-c occurs (on both
> OS) or a window-close (cygwin) to the filename of a fully transfered file
> (without the dot), which aren't complete !

Yes, that is the definition of what partial mode does with a partially
transferred file.  The manual recommends the use of the --compare-dest
option to work around this, but that doesn't appear to do the manual
says at all, so it looks like there is either a bug in the handling of
--compare-dest when --partial is enabled, or a bug in the manual.

One thing that the manual doesn't say is what file should be preferred
if there is a matching file in both the compare-dest dir and the real-
dest dir.  This becomes important if we want to use --compare-dest as a
holding zone for partial files since we would need to have the code
prefer the compare-dest file over the real-dest file if we make it put
partially transferred files into the compare-dest dir.  The current code
prefers the real-dest file over the compare-dest file and puts partially
transferred files into the real-dest dir.

Opinions?

..wayne..

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: The file name end with . (dot) will be renamed at destinationfolder

2002-09-05 Thread Wayne Davison

On Thu, 5 Sep 2002, Quang Tran Hong wrote:
> NormalFile.
> 23 (100%)
> rename .NormalFile..idNZdb -> NormalFile. : File exists

You'll note that in these messages that the dot has not been lost, so
it's not rsync's doing that is causing this problem.  It looks to be a
deficiency in your OS.  Are you trying to send this file to a Microsoft
OS?

..wayne..

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: exclude option?

2002-09-17 Thread Wayne Davison

On Tue, 17 Sep 2002, Bjorn Graabek wrote:
> and here are (currently) the contents of my exclude.txt file:
> 
> ---
> + /my documents*
> + /favorites*
> + /cookies*
> + /local settings/application data/microsoft/outlook/outlook.pst
> - /*
> ---

I think the first problem is that you aren't using the right
capitalization.  Rsync does not ignore case, so it will not match
the named directories unless you specify them using the same
mixed-case that is returned by a directory list.

Another problem is that you don't specify a way to get into the
"Local Settings" dir to get down to the outlook.pst file.

Finally, the trailing '*' is not needed if you are matching the
directory name exactly, only if you are matching more than one
directory with each line.  I'm assuming you aren't, so here's my
suggested solution (you'll have to check if the mixed case is OK):

---
+ /My Documents
+ /Favorites
+ /Cookies
+ /Local Settings
- /*
+ /Local Settings/Application Data
- /Local Settings/*
+ /Local Settings/Application Data/Microsoft
- /Local Settings/Application Data/*
+ /Local Settings/Application Data/Microsoft/Outlook
- /Local Settings/Application Data/Microsoft/*
+ /Local Settings/Application Data/Microsoft/Outlook/outlook.pst
- /Local Settings/Application Data/Microsoft/Outlook/*
---

All the extra rules with the "/Local Settings" dir is because I
assume there are other files in this hierarchy that you don't want
to copy or you just would have said "+ /Local Settings" and left it
at that.  These lines specify a path that the hierarchical descent
through the directories can follow that will get it to the lone file
that you want to send, and excludes all other files and directories.

..wayne..

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: --delete-after subtleties

2002-10-01 Thread Wayne Davison

On Tue, 1 Oct 2002, Nick Papadonis wrote:
> [In 2.5.5] --delete-after [...] must be used with --delete to work.

Unfortunately.  In the current CVS version, however, --delete-after now
implies --delete and the man page mentions this fact.  So, this will
work more logically whenever 2.5.6 gets released.

..wayne..

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Exclude symbolic link to a directory?

2002-10-10 Thread Wayne Davison

On Thu, Oct 10, 2002 at 10:49:33AM -0400, Bryan K. Wright wrote:
>   The master copy of /local contains the directory "stuff", not
> a symbolic link.  The problem is, when I rsync /local on the few
> machines that have a symbolic link, the link gets nuked and replaced
> with a real directory (just like in the master copy).

Correct.  That's the only thing that rsync currently knows how to do
with symlinks to directories -- make them identical with what's on the
server.  The easiest way around this at the moment is to break up the
rsync command into multiple runs, one which excludes all the potential
symlink differences, and one for each symlink dir that you want to
transfer.

>   What I've tried is excluding "/local/stuff" and including 
> "/local/stuff/*", but the stuff symlink still gets nuked.

I don't think you were successful at getting the dir excluded, then, or
else it would have been untouched, not nuked.  Rsync would not have done
what you wanted, though, since it has to send the /local/stuff dir to
try to send what's inside it (when working recursively).  You probably
did an exclude of "/local/stuff" rather than "/stuff" (since I assume
that the base dir is probably "/local").

So, a solution like this should work:

rsync -av --exclude="/stuff" /local/ remote:/local
rsync -av /local/stuff/ remote:/local/stuff

A better, long-term fix would be to add an option that would allow
certain symlinks to be treated as a directory.  To do this, we need to
work out a good heuristic on how to differentiate which is which.  I
imagine using the following rules (when the new option is enabled):

 - If a symlink points inside the hierarchy being transferred, treat it
   as a normal symlink to duplicate (rsync already has code to determine
   this for its "safe symlink" handling).

 - If a symlink points outside the transfer AND it points to a
   directory, treat it as if it were the actual directory for the
   transfer (I think that only the delete code would need to know that
   it wasn't a real directory).

How does that sound?  It should be fairly easy to implement.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: multiple sessions to same destination

2002-10-10 Thread Wayne Davison

On Thu, Oct 10, 2002 at 04:50:33PM -0400, Bennett Todd wrote:
> The rsync opens the target file to read; if some other rsync moves a
> new file into place before that, there's no concurrency, this is
> pure sequential rsyncs; if it moves the target file into place after
> it's been opened, the older copy of the target will still be used by
> the process we're looking at, through the open file handle it holds;
> the intruding copy won't have any effect.

Unfortunately it's a little more complicated than that.  There are two
processes opening the file, first the generator (that sends the check-
sums over to the sender) and then the receiver (that opens the file to
read matching checksum blocks from the local file).  It is possible for
the file to change between these two separate file opens, resulting in
the creation of a corrupt *temporary* file.  Fortunately for us, the
whole-file checksum won't match, so rsync won't move the resulting
corrupted file into place.  It will instead reset its checksum size and
try sending the file again.  If it fails again, it prints an error and
does not update the file.

Derek:  I'd recommend checking out "unison":

http://www.cis.upenn.edu/~bcpierce/unison/

I use this software to keep my rc files in sync between several
machines, and it does a wonderful job of merging file changes in both
directions.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: HELP !!! Problem with file timestamps updating "weird" during rsync data pull

2002-10-16 Thread Wayne Davison

On Wed, Oct 16, 2002 at 01:36:10PM -0500, Sean O'Neill wrote:
> The timestamp should match that of the system the data is pulled from right 
> ?  Well, it doesn't from time to time.  The time stamp sometimes gets 
> updated as just "Oct 16 2002"

This is what most unix systems display for a future date.  I'm guessing
that the clocks on your systems are not in sync -- that the clock on the
receiving end is behind the sending end, which causes files that have
been recently modified on the sender to show up as having future dates
on the receiving system.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Size Discrepancy between source and destination

2002-10-24 Thread Wayne Davison
On Thu, Oct 24, 2002 at 12:37:34PM -0400, Shelley Waltz wrote:
> Why is there a difference in the size of the directories for marshall(and 
> many others) which makes the distination larger than the source?

The directory listings you provided show that there are hard-linked
files on the source filesystem that are not hard-linked on the
destination.  Try running rsync with the -H (--hard-link) option.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Fwd: rsync and unlink permission

2002-10-28 Thread Wayne Davison
On Mon, Oct 28, 2002 at 03:12:53PM +0800, Patrick Hsieh wrote:
> Since "foo" has no write permission under /var/www, he cannot rsync
> from remote server to the local filesystem because rsync will try to
> make temp file and unlink the original file before writing over it. Is
> there any solution to this problem?

See the -T (--temp-dir) option for how to tell rsync to put its temp
file in some other directory.  If the temp dir is on the same file
system as /var/www, rsync will still rename the new file over the top of
the old one (which insures that no one can request a partially-written
file).  If it is on a different file system, rsync will use its
copy_file() routine to copy the tmp file over the destination file.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: many files filelist problem

2002-11-12 Thread Wayne Davison
On Tue, Nov 12, 2002 at 03:27:03PM +0200, Mozzi wrote:
> [root@ais-mail01 root]# time rsync -pogrve ssh /var/spool/mail 
> [EMAIL PROTECTED]:/var/spool/mail/

FYI, this command puts the "mail" dir inside /var/spool/mail on the
destination.  You should add a trailing slash to the source path to
avoid this (or remove the "mail/" from the destination path).

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Speed problem

2002-11-12 Thread Wayne Davison
On Tue, Nov 12, 2002 at 04:32:31PM +0100, [EMAIL PROTECTED] wrote:
> I'd call it a bug.

No, it's not a bug.  It's the heart of the rsync algorithm at work.
Rsync trades CPU and local file I/O for network I/O in order to reduce
the amount of data that is transferred over the network.  Your diagnosis
has just shown that when the network I/O dips, rsync has traded it for
local I/O (grabbing matching blocks from the current file instead of
asking for it to be sent over the network).  For really large files that
have most matching data, most of the file I/O in building a new file
will not be network I/O, so it is to be expected that the data rate over
the net will drop when that occurs.

Note also that the --partial flag is only incidentally related to what
you were seeing since it ensured that the destination file had lots of
matching data whenever you interrupted the transfer.

The only alternative is to use the --whole-file option -- this option
turns off the rsync algorithm and just sends all the changed files over
the net completely (like an scp copy, but for changed files).  This
should only be used if you have a really fast network connection OR if
you don't want to trade the CPU and local I/O for network I/O.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Speed problem

2002-11-12 Thread Wayne Davison
On Tue, Nov 12, 2002 at 11:30:28PM +0100, [EMAIL PROTECTED] wrote:
> And why it tries to get 100% CPU even though there's nothing to do ?

What do you mean "nothing to do"?  Rsync is creating the new version of
a changed file which is done both by transferring data over the network
and by copying matching data from the existing version of the file.
Just because nothing is being transferred over the link doesn't mean
nothing is going on.  Or is there some other problem that I missed in
this discussion?

> Ok, that I never tried because I thought the --partial option should
> have been the fastest method because lots of data is still on the other
> side if an error has occured before.

The --partial option ensures that if we transferred a lot of data to
build a file but didn't finish it, that this data is not just thrown
away.  However, if we started with an already-existing version of a file
that was mostly the same as the new version, it is possible that when
rsync is interrupted the current partial file actually contains less
matching data in it than the already-existing version, and thus
retaining this partial file actually makes the next transfer less
efficient.  Because of this I only use the --partial option if I'm
sending really big *new* files, not updating really big existing files.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: The exclude option of Rsync not work right.

2002-11-19 Thread Wayne Davison
On Tue, Nov 19, 2002 at 11:32:06AM -0600, Lori Anderson wrote:
> rsync -av /software/testdir/ --exclude='/software/testdir/test.sql'
> landser@serv602:/software/testdir/

Inclusions and exclusions are relative to the base of the transfer.  Use
a leading '/' if you want to indicate that the inclusion/exclusion is
anchored to this base.  Like this:

rsync -av /software/testdir/ --exclude=/test.sql landser@serv602:/software/testdir/

That will exclude /software/testdir/test.sql.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync and the file's mtime

2002-11-25 Thread Wayne Davison
On Mon, Nov 25, 2002 at 09:30:03AM -0500, Jeff Bearer wrote:
> But if the file isn't modified, the modified time shouldn't be updated,

By default, rsync uses the time & size on the file to determine if it
was updated.  Since the source and destination files don't match, rsync
transferrs the file, and that updates the mtime.  There is no special
check to see if the newly created temp-file is identical to the existing
file -- the file is just updated.

If you use the -c (--checksum) option, rsync will switch to testing the
checksums of the files to determine if the file needs to be transferred.
This will cause the file not even to be sent unless it's changed, and
thus to preserve the destination file's current mtime when it is up-to-
date.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: cp(1) -n option for rsync?

2002-12-06 Thread Wayne Davison
On Fri, Dec 06, 2002 at 11:53:08AM -0800, Sander van Zoest wrote:
> I would like to be able to use rsync to mirror some directories, but
> to explicitly *not* override any files that already exist on the other
> side.

I believe you're looking for the --ignore-existing option.  I'm not sure
when it got added, but it's in 2.5.5 at least (and not in 2.4.6).

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync 2.4.6 hanging on HPUX11 only over firewall.

2002-12-09 Thread Wayne Davison
On Mon, Dec 09, 2002 at 01:49:40PM +, rsyncuser wrote:
> We are interested in finding out whether the wayne-nohang patches can
> be applied to 2.4.6. 

My older patches for 2.4.6 had got moved aside after they got
incorporated into the main distribution.  However, I just put them back
in their original spot so they can be accessed again.

The most important patch was the simplest:

http://www.clari.net/~wayne/rsync-nohang1.patch

This patch ensures that data coming from the generator to the sender
does not overflow and block during the final phase of the transfer on
the sending side (but not necessarily at the final file, due to the
buffering on the outgoing connection).  The current code waited around
for the remote process to end without reading the incoming data stream,
which was a very bad idea if the -v option was turned on.

The second patch fixed a much rarer bug -- one that should only get
tickled if a good number of the files fail to transfer correctly on the
first try and need to be resent:

http://www.clari.net/~wayne/rsync-nohang2.patch

An older version of this patch was included in the Red Hat sources for a
while, so it was pretty widely tested:

http://www.clari.net/~wayne/old/rsync-nohang.patch

(Note that this patch contains the "nohang1" patch as well.)

The reasoning behind this patch is that there is a data channel from the
receiver to the generator that tells it what files to retry.  This data
channel is left totally unread until all files are handled in pass 1.
This means that it can block if enough files need to be resent.  My
patch keeps this data channel clear by reading it whenever data appears
and setting flags on what files to resend during the retry phase.

I'm thinking about writing a new patch for the latest rsync that causes
these need-to-retry files to be immediately resent by the generator to
the sender instead of buffering them (with proper signaling to ensure
that retry files get their alternate block-sizes set).  Perhaps this
solution would finally allow this bug to be put to rest (since it's not
yet fixed in the main code).

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: rsync stoped syncing

2002-12-09 Thread Wayne Davison
On Mon, Dec 09, 2002 at 04:36:41PM +0100, Markus Lamers wrote:
> rsync -auvxz --delete --exclude-from /root/.rsync/home-daily.exc /home
> slave:/

I suspect the home-daily.exc file is at fault.  What does it contain?

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: include-exclude patterns

2002-12-10 Thread Wayne Davison
On Tue, Dec 10, 2002 at 09:18:06AM -0500, marco wrote:
> I even tried this but it include the whole /var/ folder !
> I just want /var/lib/zope.

The solution is that after you include something that is too general,
you need to exclude what you don't want.  Like this:


/etc
/var
- /*
/var/lib
- /var/*
/var/lib/zope
- /var/lib/*

Explanation:  The inclusion of /var is needed just to get rsync to
descend into that directory.  At that point, you need to add rules for
what to do inside of /var, which is to just descend into lib and exclude
everything else.  The final two rules tell rsync what to do once inside
of /var/lib (include zope, exclude everything else).  At that point
everything in the zope hierarchy will be included.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: filelist calculation algoritm

2003-01-04 Thread Wayne Davison
On Sat, Jan 04, 2003 at 12:40:05PM -0800, jw schultz wrote:
> One specifying subpaths and the other for those having a shared
> prefix.

I don't see why this is needed.  For instance, your example of a shared
prefix:

>   find srcdir | myfilter | rsync --file-list - srcdir destloc

would be easily written without any sharing:

find srcdir | myfilter | rsync --file-list - . destloc

or:

find /foo/bar | myfilter | rsync --file-list - / destloc

Am I missing something?

> doing
>   rsync --file-list-relative - src dest <   file1
>   file2
>   dir1/file3
>   EOL
> would actually sync
>   src/file1
>   src/file2
>   src/dir1
>   src/dir1/file3
> to
>   dest/file1
>   dest/file2
>   dest/dir1
>   dest/dir1/file3

I think that should only happen if the --relative option is set.
Otherwise all 3 files should go directly into "dest".

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: filelist calculation algoritm

2003-01-05 Thread Wayne Davison
On Sat, Jan 04, 2003 at 05:03:02PM -0800, jw schultz wrote:
> that would produce destloc/srcdir/
> when you might want a copy of srcdir at destloc instead of
> in destloc.

Ah yes, I _was_ missing something.  However, I still don't think we need
to clutter rsync with two types of --file-list options.  This is already
something that people have to deal with when using the --relative option:
how to generate a file list that contains just the path information that
we need to be significant.  I think that the removal of the undesired
prefixes should happen before the list gets to rsync rather than having
rsync do it (in your example the user would just chdir into "srcdir" and
do the "find" relative to '.').

Here's an alternative to the syntax you suggested.  I was thinking that
it would be nice to just read filenames from stdin and have them be
treated the same way as command-line args.  One way to indicate this
would be to specify '-' as a name to transfer, which would tell rsync to
read filenames from stdin.  Like this:

rsync -av --relative - destloc http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: filelist calculation algoritm

2003-01-05 Thread Wayne Davison
On Sun, Jan 05, 2003 at 11:55:22AM -0800, jw schultz wrote:
> The first problem is this would flatten things unless you used
> relative and forced the user's CWD.  That would cause considerable
> confusion.

Really?  This is exactly how rsync works now with multiple file names on
the command-line, so I don't see this as being any more confusing than
what we already have.  The rule would be you can specify the files on
the command-line or on stdin (if you use '-' as the only source file).
Since all names are treated in the same way regardless of where they
were specified, everything works the same as it did before, only more
names are now supported per invocation.  I'm thinking that this way is
more flexible since it allows someone to flatten things if that's what
they really want to do.

> Secondly, how would you do it when the source location is remote?
> Many of the users asking for this are doing pulls.

I mentioned a protocol change that would send the extra file names to
the other side after rsync starts up.  Currently the send_files()
routine always sends names from the sending side to the receiving side.
The new protocol would change that to always send names from the user
side to the server side when this option was specified.  The user's
command would look like this:

rsync -avR remote:- /foo/bar

The file list would be read from the local (user) side, of course.  The
remote command being run by rsync would look like this:

ssh remote rsync --server --sender -vlogDtprR . -

The presence of the '-' as the source would tell us to slurp names
instead of send them.

Since the file list is exchanged in total before we do any real work, I
think this change would actually be really easy to implement.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: [PATCH] Add .svn to the exclude list for --cvs-exclude

2003-01-08 Thread Wayne Davison
On Wed, Jan 08, 2003 at 04:42:58PM -0800, jw schultz wrote:
> -  "RCS","SCCS","CVS","CVS.adm","RCSLOG","cvslog.*",
> +  "RCS/", "SCCS/", "CVS/", "CVS.adm", "RCSLOG", "cvslog.*",
> Might be worth doing to tighten the patterns.

Yes, I'd agree with that.  I looked at the code to confirm that the
trailing slashes would be interpreted correctly, and then tested a
modified version to ensure proper functioning.

This is a simple enough change that I went ahead and checked it into
CVS.  In my version I added the ".svn/" pattern near the other dirs
instead of at the end of the list.

Thanks, Jon, for the patch.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



Re: Stats

2003-01-09 Thread Wayne Davison
On Thu, Jan 09, 2003 at 07:48:50AM -0600, Max Kipness II wrote:
> Total file size: 383219712 bytes
> Total transferred file size: 383219712 bytes
> Literal data: 3143680 bytes
> Matched data: 380076032 bytes
>  
> The total file size is definitely correct, but what I don't understand
> is the transfered size. Is rsync reporting that roughly 380mb matches?
> It would seem like it to me. But is so, why did it transfer the entire
> file?

You're thinking of the word transfer in the wrong sense here.  Rsync's
transferred file size is the total of all file sizes that needed to be
updated.  However, it doesn't mean that those bytes were sent literally
over the wire.  That stat is taken care of in the next two lines, which
tells you how much data was actually communicated literally (3143680
bytes) and how much data was communicated via matched blocks (380076032
byes).  Also, in a set of files where some matched and some didn't, the
total file size would have included the entire set of files (including
those that were up-to-date).  In a set of one, needs-to-be-updated file,
the total size will always match the transferred size.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html



  1   2   3   4   5   6   7   8   9   10   >