Re: --detect-renamed question

2007-10-13 Thread Greg Siekas

Matt,

That was too quick!  I think --trust-move is a really good thing and  
I'll test it out soon.


Some thoughts on all this now that I've had my caffeine this morning.

It would take one very crafty user to delete a file and create one  
with the same name, mtime, and size.  The only issue I can see is if   
there were 2 files with the same name, mtime, and size but different  
data.  Highly unlikely but still possible, right?  Let's call these  
files fileA and fileB.  If fileA is deleted and fileB is copied to  
another directory, what happens?  Rsync would hard link the fileA as  
the new fileB when using --trust-move.  We end up with fileA and  
fileB on the destination with fileB and fileB on the source.


So, how do we fix this situation?  Is there a way to check for  
duplicate entries?  If rsync checks if the file it's about to hard  
link is a non-unique file, (same name, mtime, size as another file)  
then it should copy from the source fileB instead of hard linking  
from the deleted fileA.  Does this make sense?  It would require  
rsync to have a complete scan of the source prior to doing anything.


This should help those situations where someone does an upper level  
directory move with lots of files and data underneath.  I recall  
someone else was asking about this on the list.


Greg


On Oct 12, 2007, at 6:43 PM, Matt McCutchen wrote:


On 10/12/07, Greg Siekas [EMAIL PROTECTED] wrote:

The other option I thought of was to only do the move when the mtime,
size, and filename match.   Not really a 'detect-renamed' but a
'detected-moved' type operation.


That's a good idea, and easy to implement too!  I have improved the
patch (attached) to provide separate --trust-rename and --trust-move
options.  Wayne, please consider adding this to patches/ .

Matt
trust-rename.diff


--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: --detect-renamed question

2007-10-12 Thread Matt McCutchen
On 10/11/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 I've started testing the detect-renamed patch with 2.6.9 and soon 3.0.0pre1.  
 I have an unique situation where I'm rsync'ing to a HSM based filesystem.  
 I've found that the detect-renamed patch works but it appears to do a copy of 
 the file to the new destination.  This is particular slow since the file in 
 the HSM based filesystem may only be a stub and all the data is only resident 
 on tape.  The copy waits for the datq to be recalled from tape which 
 depending on the file size can take a long time.  I've looked through the 
 patch code and am wondering if there is an easy way to have rsync do a move 
 from the ~.tmp. directory.

This is easy to do, and I have implemented a --trust-detect-renamed
option to do it in the attached patch to the current CVS rsync.
However, it is risky because a false rename detection could cause
rsync to substitute an unrelated but similar-looking destination file
for a new source file.  Don't use the option unless you are prepared
for the consequences.

Matt
In combination with detect-renamed.diff, this patch adds an option
--trust-detect-renamed that adopts an apparently unrenamed destination
file without verifying that its data matches that of the source file.
This is risky but it is what Greg [EMAIL PROTECTED] wanted:

http://lists.samba.org/archive/rsync/2007-October/018827.html

This patch is EXPERIMENTAL, though it did work correctly in my single test.

-- Matt McCutchen [EMAIL PROTECTED]

--- old/generator.c
+++ new/generator.c
@@ -80,6 +80,7 @@ extern int compare_dest;
 extern int copy_dest;
 extern int link_dest;
 extern int detect_renamed;
+extern int trust_detect_renamed;
 extern int whole_file;
 extern int list_only;
 extern int read_batch;
@@ -1828,6 +1829,22 @@ static void recv_generator(char *fname, 
 		fnamecmp = partialptr;
 		fnamecmp_type = FNAMECMP_PARTIAL_DIR;
 		statret = 0;
+		if (detect_renamed  trust_detect_renamed
+ unchanged_file(fnamecmp, file, sx.st)) {
+			/* Adopt the partial file. */
+			finish_transfer(fname, fnamecmp, NULL, NULL, file, 1, 1);
+			handle_partial_dir(partialptr, PDIR_DELETE);
+			if (itemizing)
+itemize(fnamecmp, file, ndx, -1, sx,
+		ITEM_LOCAL_CHANGE, fnamecmp_type, NULL);
+#ifdef SUPPORT_HARD_LINKS
+			if (preserve_hard_links  F_IS_HLINKED(file))
+finish_hard_link(file, fname, ndx, sx.st, itemizing, code, -1);
+#endif
+			if (remove_source_files == 1)
+goto return_with_success;
+			goto cleanup;
+		}
 	}
 
 	if (!do_xfers)
--- old/options.c
+++ new/options.c
@@ -81,6 +81,7 @@ int am_starting_up = 1;
 int relative_paths = -1;
 int implied_dirs = 1;
 int detect_renamed = 0;
+int trust_detect_renamed = 0;
 int numeric_ids = 0;
 int allow_8bit_chars = 0;
 int force_delete = 0;
@@ -385,6 +386,7 @@ void usage(enum logcode F)
   rprintf(F, -T, --temp-dir=DIR  create temporary files in directory DIR\n);
   rprintf(F, -y, --fuzzy find similar file for basis if no dest file\n);
   rprintf(F, --detect-renamedtry to find renamed files to speed up the transfer\n);
+  rprintf(F, --trust-detect-renamed  ... and assume identical to source files (risky!)\n);
   rprintf(F, --compare-dest=DIR  also compare destination files relative to DIR\n);
   rprintf(F, --copy-dest=DIR ... and include copies of unchanged files\n);
   rprintf(F, --link-dest=DIR hardlink to files in DIR when unchanged\n);
@@ -564,6 +566,7 @@ static struct poptOption long_options[] 
   {copy-dest,0,  POPT_ARG_STRING, 0, OPT_COPY_DEST, 0, 0 },
   {link-dest,0,  POPT_ARG_STRING, 0, OPT_LINK_DEST, 0, 0 },
   {detect-renamed,   0,  POPT_ARG_NONE,   detect_renamed, 0, 0, 0 },
+  {trust-detect-renamed,0,POPT_ARG_NONE,  trust_detect_renamed, 0, 0, 0 },
   {fuzzy,   'y', POPT_ARG_NONE,   fuzzy_basis, 0, 0, 0 },
   {compress,'z', POPT_ARG_NONE,   0, 'z', 0, 0 },
   {no-compress,  0,  POPT_ARG_VAL,do_compression, 0, 0, 0 },
@@ -1895,8 +1898,12 @@ void server_options(char **args, int *ar
 		}
 	}
 	/* Both sides need to know in case this disables incremental recursion. */
-	if (detect_renamed)
+	if (detect_renamed) {
 		args[ac++] = --detect-renamed;
+		/* But the addition of --trust-detect-renamed is only the receiver's business. */
+		if (am_sender  trust_detect_renamed)
+			args[ac++] = --trust-detect-renamed;
+	}
 
 	if (modify_window_set) {
 		if (asprintf(arg, --modify-window=%d, modify_window)  0)
--- old/rsync.yo
+++ new/rsync.yo
@@ -385,6 +385,7 @@ to the detailed description below for a 
  -T, --temp-dir=DIR  create temporary files in directory DIR
  -y, --fuzzy find similar file for basis if no dest file
  --detect-renamedtry to find renamed files to speed the xfer
+ --trust-detect-renamed  ..  assume identical to src files (risky!)
  --compare-dest=DIR  also compare received files relative to DIR
  --copy-dest=DIR ... 

Re: --detect-renamed question

2007-10-12 Thread Matt McCutchen
On 10/12/07, Greg Siekas [EMAIL PROTECTED] wrote:
 The other option I thought of was to only do the move when the mtime,
 size, and filename match.   Not really a 'detect-renamed' but a
 'detected-moved' type operation.

That's a good idea, and easy to implement too!  I have improved the
patch (attached) to provide separate --trust-rename and --trust-move
options.  Wayne, please consider adding this to patches/ .

Matt
In combination with detect-renamed.diff, this patch adds an option
--trust-rename that adopts the pre-rename destination file found for a
new source file without verifying that the data is actually the same.
It also adds a variant --trust-move that requires that the basenames
match.  These options are somewhat risky but were what Greg Siekas wanted:

http://lists.samba.org/archive/rsync/2007-October/018827.html

This patch is EXPERIMENTAL, though it did work correctly in my light
testing.

FIXME: If a run with --trust-rename stages a different-basename destination
file and then gets interrupted, a subsequent run with --trust-move trusts
the staged file.

-- Matt McCutchen [EMAIL PROTECTED]

--- old/generator.c
+++ new/generator.c
@@ -80,6 +80,7 @@ extern int compare_dest;
 extern int copy_dest;
 extern int link_dest;
 extern int detect_renamed;
+extern int trust_rename;
 extern int whole_file;
 extern int list_only;
 extern int read_batch;
@@ -212,7 +213,9 @@ static int fattr_find(struct file_struct
 			high = mid - 1;
 	}
 
-	return good_match = 0 ? good_match : ok_match;
+	return good_match = 0 ? good_match :
+		/* --trust-move doesn't allow non-basename matches */
+		(trust_rename == 1) ? -1 : ok_match;
 }
 
 static void look_for_rename(struct file_struct *file, char *fname)
@@ -1826,6 +1829,22 @@ static void recv_generator(char *fname, 
 		fnamecmp = partialptr;
 		fnamecmp_type = FNAMECMP_PARTIAL_DIR;
 		statret = 0;
+		if (detect_renamed  trust_rename
+ unchanged_file(fnamecmp, file, sx.st)) {
+			/* Adopt the partial file. */
+			finish_transfer(fname, fnamecmp, NULL, NULL, file, 1, 1);
+			handle_partial_dir(partialptr, PDIR_DELETE);
+			if (itemizing)
+itemize(fnamecmp, file, ndx, -1, sx,
+		ITEM_LOCAL_CHANGE, fnamecmp_type, NULL);
+#ifdef SUPPORT_HARD_LINKS
+			if (preserve_hard_links  F_IS_HLINKED(file))
+finish_hard_link(file, fname, ndx, sx.st, itemizing, code, -1);
+#endif
+			if (remove_source_files == 1)
+goto return_with_success;
+			goto cleanup;
+		}
 	}
 
 	if (!do_xfers)
--- old/options.c
+++ new/options.c
@@ -81,6 +81,7 @@ int am_starting_up = 1;
 int relative_paths = -1;
 int implied_dirs = 1;
 int detect_renamed = 0;
+int trust_rename = 0;
 int numeric_ids = 0;
 int allow_8bit_chars = 0;
 int force_delete = 0;
@@ -385,6 +386,8 @@ void usage(enum logcode F)
   rprintf(F, -T, --temp-dir=DIR  create temporary files in directory DIR\n);
   rprintf(F, -y, --fuzzy find similar file for basis if no dest file\n);
   rprintf(F, --detect-renamedtry to find renamed files to speed up the transfer\n);
+  rprintf(F, --trust-rename  ... and assume identical to source files (risky!)\n);
+  rprintf(F, --trust-move... only if basenames match (less risky)\n);
   rprintf(F, --compare-dest=DIR  also compare destination files relative to DIR\n);
   rprintf(F, --copy-dest=DIR ... and include copies of unchanged files\n);
   rprintf(F, --link-dest=DIR hardlink to files in DIR when unchanged\n);
@@ -564,6 +567,8 @@ static struct poptOption long_options[] 
   {copy-dest,0,  POPT_ARG_STRING, 0, OPT_COPY_DEST, 0, 0 },
   {link-dest,0,  POPT_ARG_STRING, 0, OPT_LINK_DEST, 0, 0 },
   {detect-renamed,   0,  POPT_ARG_NONE,   detect_renamed, 0, 0, 0 },
+  {trust-rename, 0,  POPT_ARG_VAL,trust_rename, 2, 0, 0 },
+  {trust-move,   0,  POPT_ARG_VAL,trust_rename, 1, 0, 0 },
   {fuzzy,   'y', POPT_ARG_NONE,   fuzzy_basis, 0, 0, 0 },
   {compress,'z', POPT_ARG_NONE,   0, 'z', 0, 0 },
   {no-compress,  0,  POPT_ARG_VAL,do_compression, 0, 0, 0 },
@@ -1895,8 +1900,13 @@ void server_options(char **args, int *ar
 		}
 	}
 	/* Both sides need to know in case this disables incremental recursion. */
-	if (detect_renamed)
+	if (detect_renamed) {
 		args[ac++] = --detect-renamed;
+		/* But the addition of --trust-* is only the receiver's business. */
+		if (am_sender  trust_rename)
+			args[ac++] = (trust_rename == 2) ?
+	--trust-rename : --trust-move;
+	}
 
 	if (modify_window_set) {
 		if (asprintf(arg, --modify-window=%d, modify_window)  0)
--- old/rsync.yo
+++ new/rsync.yo
@@ -385,6 +385,8 @@ to the detailed description below for a 
  -T, --temp-dir=DIR  create temporary files in directory DIR
  -y, --fuzzy find similar file for basis if no dest file
  --detect-renamedtry to find renamed files to speed the xfer
+ --trust-rename  ... assume identical to src files (risky!)
+ --trust-move

--detect-renamed question

2007-10-11 Thread radius13a
I've started testing the detect-renamed patch with 2.6.9 and soon 3.0.0pre1.  I 
have an unique situation where I'm rsync'ing to a HSM based filesystem.  I've 
found that the detect-renamed patch works but it appears to do a copy of the 
file to the new destination.  This is particular slow since the file in the HSM 
based filesystem may only be a stub and all the data is only resident on tape.  
The copy waits for the datq to be recalled from tape which depending on the 
file size can take a long time.  I've looked through the patch code and am 
wondering if there is an easy way to have rsync do a move from the ~.tmp. 
directory.

thanks,
Greg
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html