Re: To inc_recurse or not to inc_recurse? [Re: 3.0.0pre2: bookend breakage (2 different errors)]

2007-10-17 Thread Erik Jan Tromp
On Mon, 15 Oct 2007 23:15:34 -0400
Matt McCutchen [EMAIL PROTECTED] wrote:

 On 10/15/07, Erik Jan Tromp [EMAIL PROTECTED] wrote:
  # The second error
  Invalid file index: -101 (-1 - 0) with iflags 0 [receiver]
  rsync error: protocol incompatibility (code 2) at rsync.c(273) 
  [receiver=3.0.0pre2]
  rsync: connection unexpectedly closed (21 bytes received so far) [generator]
  rsync error: error in rsync protocol data stream (code 12) at io.c(596) 
  [generator=3.0.0pre2]
 
  # Sample commands, obfuscated to protect the guilty
  cp -alf $BACKUPPATH/fluorine/sunday/. $BACKUPPATH/fluorine/monday/
  rsync [...] --delete-after [...]
rsync://[EMAIL PROTECTED]:$BACKUPPORT/$BACKUPMODULE/ \
$BACKUPPATH/fluorine/monday/ [...]

For the record, I no longer get this error after updating from cvs (Wed Oct 17 
10:28:30 UTC 2007).

Erik

-- 
Failure is not an option. (It comes bundled with Windows.)
If at first you don't succeed, redefine success.

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: To inc_recurse or not to inc_recurse? [Re: 3.0.0pre2: bookend breakage (2 different errors)]

2007-10-16 Thread Wayne Davison
On Tue, Oct 16, 2007 at 12:30:34AM -0400, Matt McCutchen wrote:
 In this situation, automatically disabling incremental recursion would
 be better than exiting with an error that really isn't the user's
 fault.

Yeah.  I hadn't wanted to add an extra round-trip delay that would be
required by a direct exchange of bytes.  After some more contemplation
I've come up with a good way to convey the inc_recurse decision back and
forth between the two sides without slowing things down.  And the latest
code does let the affected side make the determination of which options
are not support in incremental recursive mode.

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


To inc_recurse or not to inc_recurse? [Re: 3.0.0pre2: bookend breakage (2 different errors)]

2007-10-15 Thread Matt McCutchen
On 10/15/07, Erik Jan Tromp [EMAIL PROTECTED] wrote:
 # The second error
 Invalid file index: -101 (-1 - 0) with iflags 0 [receiver]
 rsync error: protocol incompatibility (code 2) at rsync.c(273) 
 [receiver=3.0.0pre2]
 rsync: connection unexpectedly closed (21 bytes received so far) [generator]
 rsync error: error in rsync protocol data stream (code 12) at io.c(596) 
 [generator=3.0.0pre2]

 # Sample commands, obfuscated to protect the guilty
 cp -alf $BACKUPPATH/fluorine/sunday/. $BACKUPPATH/fluorine/monday/
 rsync [...] --delete-after [...]
   rsync://[EMAIL PROTECTED]:$BACKUPPORT/$BACKUPMODULE/ \
   $BACKUPPATH/fluorine/monday/ [...]

This is the same problem I ran into with --detect-renamed: the client
disables incremental recursion because of --delete-after but doesn't
send the option to the server, so the server has no idea that it is
supposed to disable incremental recursion.  The other options that can
disable incremental recursion (see compat.c) may cause the same crash.

I suppose server_options could be changed to send all of these options
unconditionally.  However, a more discreet and much more robust
solution would be for the client and server to negotiate whether to
incrementally recurse just after negotiating the protocol version.

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: To inc_recurse or not to inc_recurse? [Re: 3.0.0pre2: bookend breakage (2 different errors)]

2007-10-15 Thread Wayne Davison
On Mon, Oct 15, 2007 at 11:15:34PM -0400, Matt McCutchen wrote:
 However, a more discreet and much more robust solution would be for
 the client and server to negotiate whether to incrementally recurse
 just after negotiating the protocol version.

Yeah, I agree that it is better for the client to explicitly tell the
server what is going on (and allows a batch file to indicate what is
happening too).  That way there is no confusion.  It also allows for
future expansion in certain situations -- e.g. I can imagine making a
future version of --prune-empty-dirs and/or --delay-updates compatible
with inc-recursion, and this will allow a more modern rsync to try to
tell a remote receiver to use that option and stay in inc_recurse mode,
and the receiver can say no, I don't support that. 

The CVS version (and latest nightly) have this fixed.

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: To inc_recurse or not to inc_recurse? [Re: 3.0.0pre2: bookend breakage (2 different errors)]

2007-10-15 Thread Matt McCutchen
On 10/16/07, Wayne Davison [EMAIL PROTECTED] wrote:
 It also allows for
 future expansion in certain situations -- e.g. I can imagine making a
 future version of --prune-empty-dirs and/or --delay-updates compatible
 with inc-recursion, and this will allow a more modern rsync to try to
 tell a remote receiver to use that option and stay in inc_recurse mode,
 and the receiver can say no, I don't support that.

In this situation, automatically disabling incremental recursion would
be better than exiting with an error that really isn't the user's
fault.  (That's why I said negotiate.)

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: To inc_recurse or not to inc_recurse? [Re: 3.0.0pre2: bookend breakage (2 different errors)]

2007-10-15 Thread Matt McCutchen
On 10/16/07, Wayne Davison [EMAIL PROTECTED] wrote:
 Yeah, I agree that it is better for the client to explicitly tell the
 server what is going on (and allows a batch file to indicate what is
 happening too).

Now you could revert the unconditional sending of --detect-renamed in
detect-renamed.diff; the new detect-renamed.diff is attached.

Matt
This patch adds the --detect-renamed option which makes rsync notice files
that either (1) match in size  modify-time (plus the basename, if possible)
or (2) match in size  checksum (when --checksum was also specified) and use
each match as an alternate basis file to speed up the transfer.

The algorithm attempts to scan the receiving-side's files in an efficient
manner.  If --delete[-before] is enabled, we'll take advantage of the
pre-transfer delete pass to prepare any alternate-basis-file matches we
might find.  If --delete-before is not enabled, rsync does the rename scan
during the regular file-sending scan (scanning each directory right before
the generator starts updating files from that dir).  In this latter mode,
rsync might delay the updating of a file (if no alternate-basis match was
yet found) until the full scan of the receiving side is complete, at which
point any delayed files are processed.

I chose to hard-link the alternate-basis files into a .~tmp~ subdir that
takes advantage of rsync's pre-existing partial-dir logic.  This uses less
memory than trying to keep track of the matches internally, and also allows
any deletions or file-updates to occur normally without interfering with
these alternate-basis discoveries.

To use this patch, run these commands for a successful build:

patch -p1 patches/detect-renamed.diff
./configure (optional if already run)
make

TODO:

  We need to never return a match from fattr_find() that has a basis
  file.  This will ensure that we don't try to give a renamed file to
  a file that can't use it, while missing out on giving it to a file
  that could use it.

--- old/compat.c
+++ new/compat.c
@@ -41,6 +41,7 @@ extern int checksum_seed;
 extern int basis_dir_cnt;
 extern int prune_empty_dirs;
 extern int protocol_version;
+extern int detect_renamed;
 extern int protect_args;
 extern int preserve_uid;
 extern int preserve_gid;
@@ -218,7 +219,7 @@ void setup_protocol(int f_out,int f_in)
 	} else if (protocol_version = 30) {
 		if (recurse  allow_inc_recurse
 		  !delete_before  !delete_after  !delay_updates
-		  !use_qsort  !prune_empty_dirs)
+		  !use_qsort  !prune_empty_dirs  !detect_renamed)
 			inc_recurse = 1;
 		if (am_server || read_batch) {
 			int i_r = read_byte(f_in);
--- old/flist.c
+++ new/flist.c
@@ -61,6 +61,7 @@ extern int non_perishable_cnt;
 extern int prune_empty_dirs;
 extern int copy_links;
 extern int copy_unsafe_links;
+extern int detect_renamed;
 extern int protocol_version;
 extern int sanitize_paths;
 extern struct stats stats;
@@ -113,6 +114,8 @@ static int64 tmp_dev, tmp_ino;
 #endif
 static char tmp_sum[MAX_DIGEST_LEN];
 
+struct file_list the_fattr_list;
+
 static char empty_sum[MAX_DIGEST_LEN];
 static int flist_count_offset; /* for --delete --progress */
 static int dir_count = 0;
@@ -252,6 +255,45 @@ static int is_excluded(char *fname, int 
 	return 0;
 }
 
+static int fattr_compare(struct file_struct **file1, struct file_struct **file2)
+{
+	struct file_struct *f1 = *file1;
+	struct file_struct *f2 = *file2;
+	int64 len1 = F_LENGTH(f1), len2 = F_LENGTH(f2);
+	int diff;
+
+	if (!f1-basename || !S_ISREG(f1-mode) || !len1) {
+		if (!f2-basename || !S_ISREG(f2-mode) || !len2)
+			return 0;
+		return 1;
+	}
+	if (!f2-basename || !S_ISREG(f2-mode) || !len2)
+		return -1;
+
+	/* Don't use diff for values that are longer than an int. */
+	if (len1 != len2)
+		return len1  len2 ? -1 : 1;
+
+	if (always_checksum) {
+		diff = u_memcmp(F_SUM(f1), F_SUM(f2), checksum_len);
+		if (diff)
+			return diff;
+	} else if (f1-modtime != f2-modtime)
+		return f1-modtime  f2-modtime ? -1 : 1;
+
+	diff = u_strcmp(f1-basename, f2-basename);
+	if (diff)
+		return diff;
+
+	if (f1-dirname == f2-dirname)
+		return 0;
+	if (!f1-dirname)
+		return -1;
+	if (!f2-dirname)
+		return 1;
+	return u_strcmp(f1-dirname, f2-dirname);
+}
+
 static void send_directory(int f, struct file_list *flist,
 			   char *fbuf, int len, int flags);
 
@@ -2154,6 +2196,25 @@ struct file_list *recv_file_list(int f)
 
 	clean_flist(flist, relative_paths);
 
+	if (detect_renamed) {
+		int j = flist-used;
+		the_fattr_list.used = j;
+		the_fattr_list.files = new_array(struct file_struct *, j);
+		if (!the_fattr_list.files)
+			out_of_memory(recv_file_list);
+		memcpy(the_fattr_list.files, flist-files,
+		   j * sizeof (struct file_struct *));
+		qsort(the_fattr_list.files, j,
+		  sizeof the_fattr_list.files[0], (int (*)())fattr_compare);
+		the_fattr_list.low = 0;
+		while (j--  0) {
+			struct file_struct *fp = the_fattr_list.files[j];
+			if (fp-basename  S_ISREG(fp-mode)  

Re: To inc_recurse or not to inc_recurse? [Re: 3.0.0pre2: bookend breakage (2 different errors)]

2007-10-15 Thread Matt McCutchen
On 10/16/07, Matt McCutchen [EMAIL PROTECTED] wrote:
 (That's why I said negotiate.)

One more thing I want to point out in case you haven't already thought
of it.  Once two-way negotiation is in place, each side should refuse
incremental recursion only if it can't fulfill its *own* duties under
incremental recursion; the sides should not second-guess each other.
This way, e.g., incremental recursion can be used on a push with
--delay-updates if the server supports the combination, regardless of
whether the client would if it were the receiver.

Matt
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html