bugfix: indeterministic file choice from multiple sources

2004-08-25 Thread Dirk Pape
Hello,
some time ago I reported a bug, where we saw indeterministic behaviour of 
rsync (all versions since 2.5), when having the same file appearing in 
multiple sources. Sometimes the file in the first source was copied, other 
times the file was copied from one of the other sources.

The attached mstest.tgz contains a test to reproduce the behaviour under 
darwin and solaris.

The bug did *not* show up in gnu linux versions of rsync, which will be 
explained below:

rsync uses the qsort system call to compose the entire file list from all 
files of all sources. qsort is known to be unstable, meaning that is does 
not guarantee the former order, if items to sort have the same value. Our 
test case triggers a situation where this unstabilibity shows up.

Why does it not happen in gnu linux versions?
Reading man pages showed us that glibc has an optimization in qsort: if 
memory is not low it uses mergesort instead, which is a stable sort 
algorithm.

fix:
Since in our scenario using rsync we rely on deterministic behaviour, we 
patched rsync to use mergesort always for composing the file list. For 
systems without a mergesort system call (most os's except freebsd/darwin) 
we use the freebsd implementation of mergesort and put it in the source 
tree of rsync. patches (relative to 2.6.2) and source are attached.

I want to share this with the public and propose to change rsync to use 
mergesort instead of qsort. if this is not mainstream since mergesort has 
worse memory complexity, I propose to give users a command line switch to 
decide, whether they want to use the feature (prefer reliability for some 
scenario over performance) or not.

Hope this will be heared.
Thanks,
Dirk.

mstest.tgz
Description: GNU Zip compressed data


patches.tgz
Description: GNU Zip compressed data
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Using --keep-dirlinks : recursive symlinks problem

2004-08-25 Thread Wayne Davison
On Fri, Aug 20, 2004 at 04:50:45PM +0400, Ivan S. Manida wrote:
 Or please kick me in the right direction for a workaround which would 
 make --keep-dirlinks consider sane symlinks only.

Seems like the only good solution for this is to keep track of the
device and inode of all the dirs we visit so that we can eliminate all
duplicate directories.  Attached is a patch that does this using a
simple binary insertion sort.  Very minimally tested.  Thoughts?
Optimizations?

..wayne..
--- flist.c 12 Aug 2004 18:20:07 -  1.236
+++ flist.c 25 Aug 2004 07:27:12 -
@@ -724,6 +724,43 @@ void receive_file_entry(struct file_stru
 }
 
 
+static BOOL saw_dir(dev_t dev, ino_t ino)
+{
+   static struct dirinfo { dev_t dev; ino_t ino; } *dirarray;
+   static int dirarray_cnt, dirarray_size;
+   int low, high;
+
+   if (dirarray_cnt == dirarray_size) {
+   dirarray = realloc_array(dirarray, struct dirinfo,
+dirarray_size += 4096);
+   }
+
+   for (low = 0, high = dirarray_cnt - 1; low = high; ) {
+   int j = (low + high) / 2;
+   if (ino == dirarray[j].ino) {
+   if (dev == dirarray[j].dev)
+   return True;
+   if (dev  dirarray[j].dev)
+   low = j + 1;
+   else
+   high = j - 1;
+   } else if (ino  dirarray[j].ino)
+   low = j + 1;
+   else
+   high = j - 1;
+   }
+
+   if (low  dirarray_cnt) {
+   memmove(dirarray + low + 1, dirarray + low,
+   (dirarray_cnt - low) * sizeof dirarray[0]);
+   }
+   dirarray[low].dev = dev;
+   dirarray[low].ino = ino;
+   dirarray_cnt++;
+
+   return False;
+}
+
 /**
  * Create a file_struct for a named file by reading its stat()
  * information and performing extensive checks against global
@@ -802,9 +839,14 @@ struct file_struct *make_file(char *fnam
if (exclude_level == NO_EXCLUDES)
goto skip_excludes;
 
-   if (S_ISDIR(st.st_mode)  !recurse  !files_from) {
-   rprintf(FINFO, skipping directory %s\n, thisname);
-   return NULL;
+   if (S_ISDIR(st.st_mode)) {
+   if (!recurse  !files_from) {
+   rprintf(FINFO, skipping directory %s\n, thisname);
+   return NULL;
+   }
+   if ((keep_dirlinks || copy_links)
+saw_dir(st.st_dev, st.st_ino))
+   return NULL;
}
 
/* We only care about directories because we need to avoid recursing
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: bugfix: indeterministic file choice from multiple sources

2004-08-25 Thread Paul Slootman
On Wed 25 Aug 2004, Dirk Pape wrote:

 some time ago I reported a bug, where we saw indeterministic behaviour of 
 rsync (all versions since 2.5), when having the same file appearing in 
 multiple sources. Sometimes the file in the first source was copied, other 
 times the file was copied from one of the other sources.
[...]
 Since in our scenario using rsync we rely on deterministic behaviour, we 

What I'm wondering is why it's a problem that the same file is
randomly copied from one of a number of sources, if indeed it is the
same file. The resulting destination will be the same, right?


Paul Slootman
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Using --keep-dirlinks : recursive symlinks problem

2004-08-25 Thread Ivan S. Manida
Seems fine, using inodes to sort out duplicates is a great idea. I 
suppose using hashes won't give much speedup here, since quantity of 
directories is not known beforehand. I will apply and test the patch 
today, thanks!

Wayne Davison wrote:
On Fri, Aug 20, 2004 at 04:50:45PM +0400, Ivan S. Manida wrote:
Or please kick me in the right direction for a workaround which would 
make --keep-dirlinks consider sane symlinks only.

Seems like the only good solution for this is to keep track of the
device and inode of all the dirs we visit so that we can eliminate all
duplicate directories.  Attached is a patch that does this using a
simple binary insertion sort.  Very minimally tested.  Thoughts?
Optimizations?
..wayne..

--- flist.c	12 Aug 2004 18:20:07 -	1.236
+++ flist.c	25 Aug 2004 07:27:12 -
@@ -724,6 +724,43 @@ void receive_file_entry(struct file_stru
 }
 
 
+static BOOL saw_dir(dev_t dev, ino_t ino)
+{
+	static struct dirinfo { dev_t dev; ino_t ino; } *dirarray;
+	static int dirarray_cnt, dirarray_size;
+	int low, high;
+
+	if (dirarray_cnt == dirarray_size) {
+		dirarray = realloc_array(dirarray, struct dirinfo,
+	 dirarray_size += 4096);
+	}
+
+	for (low = 0, high = dirarray_cnt - 1; low = high; ) {
+		int j = (low + high) / 2;
+		if (ino == dirarray[j].ino) {
+			if (dev == dirarray[j].dev)
+return True;
+			if (dev  dirarray[j].dev)
+low = j + 1;
+			else
+high = j - 1;
+		} else if (ino  dirarray[j].ino)
+			low = j + 1;
+		else
+			high = j - 1;
+	}
+
+	if (low  dirarray_cnt) {
+		memmove(dirarray + low + 1, dirarray + low,
+			(dirarray_cnt - low) * sizeof dirarray[0]);
+	}
+	dirarray[low].dev = dev;
+	dirarray[low].ino = ino;
+	dirarray_cnt++;
+
+	return False;
+}
+
 /**
  * Create a file_struct for a named file by reading its stat()
  * information and performing extensive checks against global
@@ -802,9 +839,14 @@ struct file_struct *make_file(char *fnam
 	if (exclude_level == NO_EXCLUDES)
 		goto skip_excludes;
 
-	if (S_ISDIR(st.st_mode)  !recurse  !files_from) {
-		rprintf(FINFO, skipping directory %s\n, thisname);
-		return NULL;
+	if (S_ISDIR(st.st_mode)) {
+		if (!recurse  !files_from) {
+			rprintf(FINFO, skipping directory %s\n, thisname);
+			return NULL;
+		}
+		if ((keep_dirlinks || copy_links)
+		 saw_dir(st.st_dev, st.st_ino))
+			return NULL;
 	}
 
 	/* We only care about directories because we need to avoid recursing

--
Ivan S. Manida, cdev/buildmaster
Sun SPb: 33033, Bld. 1, Room 217
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: bugfix: indeterministic file choice from multiple sources

2004-08-25 Thread Dirk Pape
Hallo Paul,
--Am Mittwoch, 25. August 2004 10:03 Uhr +0200 schrieb Paul Slootman 
[EMAIL PROTECTED]:

What I'm wondering is why it's a problem that the same file is
randomly copied from one of a number of sources, if indeed it is the
same file. The resulting destination will be the same, right?
No, it is not randomly (so I lied, when I said indeterministic), but see 
the following scenario for a command line

rsync -r src1/ src2/ src3/ target/
where the dirs src1, scr2 and src3 consist of files with the same relative 
patch (e.g. dir1/foo/bar and dir2/foo/bar) but with *different content*.

It now depends on the names and numbers of other files in dir1 .. dir3, 
whether dir1/foo/bar or dir2/foo/bar will be copied into target/foo/bar.

Dirk.
--
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: bugfix: indeterministic file choice from multiple sources

2004-08-25 Thread Paul Slootman
On Wed 25 Aug 2004, Dirk Pape wrote:
 --Am Mittwoch, 25. August 2004 10:03 Uhr +0200 schrieb Paul Slootman 
 [EMAIL PROTECTED]:
 
 What I'm wondering is why it's a problem that the same file is
 randomly copied from one of a number of sources, if indeed it is the
 same file. The resulting destination will be the same, right?
 
 No, it is not randomly (so I lied, when I said indeterministic), but see 
 the following scenario for a command line
 
 rsync -r src1/ src2/ src3/ target/
 
 where the dirs src1, scr2 and src3 consist of files with the same relative 
 patch (e.g. dir1/foo/bar and dir2/foo/bar) but with *different content*.

Ah, so you were also lying when you said that the same file existed at
more than one source; the fileNAME is the same, but the file itself is
different.

 It now depends on the names and numbers of other files in dir1 .. dir3, 
 whether dir1/foo/bar or dir2/foo/bar will be copied into target/foo/bar.

I'm wondering whether this is a sane setup / config, and perhaps solved
by some other means...

In the meantime I'm curious about the relative memory usage of qsort
vs. mergesort.  I'd hate rsync's memory usage to go up again.


Paul Slootman
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: bugfix: indeterministic file choice from multiple sources

2004-08-25 Thread Wayne Davison
On Wed, Aug 25, 2004 at 08:44:15AM +0200, Dirk Pape wrote:
 Since in our scenario using rsync we rely on deterministic behaviour

What would you think of a tie-break for identical names that depended on
some other attribute of the files?  Such as newest file wins?  That
would be quite easy to add and would be deterministic, but perhaps not
in the way you want.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


[Bug 1660] --exclude option causes rsync to fail

2004-08-25 Thread samba-bugs
https://bugzilla.samba.org/show_bug.cgi?id=1660





--- Additional Comments From [EMAIL PROTECTED]  2004-08-25 14:03 ---
Wayne,

This is the way BackupPC (using File::RsyncP) invokes the client side
of rsync.  This is perl code that emulates one side of the rsync
connection, and the real rsync is used as the server side with the
--server option.

I agree that using the --server option manually in this manner is subject
to change, so I can't reasonably expect backward compatibility for an
internal feature.

So it looks like I should update File::RsyncP to send the exclude arguments
through the socket instead of arglist.  Up until 2.6.2 the arglist method
worked fine.

Craig


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the QA contact for the bug, or are watching the QA contact.
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: Problem related to time-stamp

2004-08-25 Thread shubhra dutt
Hi Wayne,

Sorry to disturb u again,
but now i have one more very small problem.
we need not to supply the password on command line
everytime,
when we will use rsync for transferring files from one
m/c to another,i want to fix login name as well as
passord for a particular m/c.
I try out the option of seting the environment
variable RSYNC_PASSORD and --password-file=FILE .
but both of them is not solve my purpose.
or may be i m not using them in proper way so kindly
gave me ur suggestions regarding the login name and
password.

Thanks
Shubhra
--- Wayne Davison [EMAIL PROTECTED] wrote:

 On Fri, Jul 30, 2004 at 02:49:16AM -0700, shubhra
 dutt wrote:
  when i rsync them to remote m/c the time-stamp of
  the file on remote m/c (which i transfered from my
  m/c) will change.
 
 Use the -t option to preserve the timestamp from the
 original and allow
 rsync to avoid sending files that are already
 up-to-date.  If you can't
 do that, your only other option is to use -c (which
 will be quite a bit
 slower).
 
 ..wayne..
 




__
Do you Yahoo!?
Yahoo! Mail is new and improved - Check it out!
http://promotions.yahoo.com/new_mail
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


How to fix the login name and password for a particular remote m/c

2004-08-25 Thread shubhra dutt
Hi Wayne,

Sorry to disturb u again,
but now i have one more very small problem.
we need not to supply the password on command line
everytime,
when we will use rsync for transferring files from one
m/c to another,i want to fix login name as well as
passord for a particular m/c.
I try out the option of seting the environment
variable RSYNC_PASSORD and --password-file=FILE .
but both of them is not solve my purpose.
or may be i m not using them in proper way so kindly
gave me ur suggestions regarding the login name and
password.

Thanks
Shubhra
--- Wayne Davison [EMAIL PROTECTED] wrote:

 On Fri, Jul 30, 2004 at 02:49:16AM -0700, shubhra
 dutt wrote:
  when i rsync them to remote m/c the time-stamp of
  the file on remote m/c (which i transfered from my
  m/c) will change.
 
 Use the -t option to preserve the timestamp from the
 original and allow
 rsync to avoid sending files that are already
 up-to-date.  If you can't
 do that, your only other option is to use -c (which
 will be quite a bit
 slower).
 
 ..wayne..
 





__
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html