bugfix: indeterministic file choice from multiple sources
Hello, some time ago I reported a bug, where we saw indeterministic behaviour of rsync (all versions since 2.5), when having the same file appearing in multiple sources. Sometimes the file in the first source was copied, other times the file was copied from one of the other sources. The attached mstest.tgz contains a test to reproduce the behaviour under darwin and solaris. The bug did *not* show up in gnu linux versions of rsync, which will be explained below: rsync uses the qsort system call to compose the entire file list from all files of all sources. qsort is known to be unstable, meaning that is does not guarantee the former order, if items to sort have the same value. Our test case triggers a situation where this unstabilibity shows up. Why does it not happen in gnu linux versions? Reading man pages showed us that glibc has an optimization in qsort: if memory is not low it uses mergesort instead, which is a stable sort algorithm. fix: Since in our scenario using rsync we rely on deterministic behaviour, we patched rsync to use mergesort always for composing the file list. For systems without a mergesort system call (most os's except freebsd/darwin) we use the freebsd implementation of mergesort and put it in the source tree of rsync. patches (relative to 2.6.2) and source are attached. I want to share this with the public and propose to change rsync to use mergesort instead of qsort. if this is not mainstream since mergesort has worse memory complexity, I propose to give users a command line switch to decide, whether they want to use the feature (prefer reliability for some scenario over performance) or not. Hope this will be heared. Thanks, Dirk. mstest.tgz Description: GNU Zip compressed data patches.tgz Description: GNU Zip compressed data -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Using --keep-dirlinks : recursive symlinks problem
On Fri, Aug 20, 2004 at 04:50:45PM +0400, Ivan S. Manida wrote: Or please kick me in the right direction for a workaround which would make --keep-dirlinks consider sane symlinks only. Seems like the only good solution for this is to keep track of the device and inode of all the dirs we visit so that we can eliminate all duplicate directories. Attached is a patch that does this using a simple binary insertion sort. Very minimally tested. Thoughts? Optimizations? ..wayne.. --- flist.c 12 Aug 2004 18:20:07 - 1.236 +++ flist.c 25 Aug 2004 07:27:12 - @@ -724,6 +724,43 @@ void receive_file_entry(struct file_stru } +static BOOL saw_dir(dev_t dev, ino_t ino) +{ + static struct dirinfo { dev_t dev; ino_t ino; } *dirarray; + static int dirarray_cnt, dirarray_size; + int low, high; + + if (dirarray_cnt == dirarray_size) { + dirarray = realloc_array(dirarray, struct dirinfo, +dirarray_size += 4096); + } + + for (low = 0, high = dirarray_cnt - 1; low = high; ) { + int j = (low + high) / 2; + if (ino == dirarray[j].ino) { + if (dev == dirarray[j].dev) + return True; + if (dev dirarray[j].dev) + low = j + 1; + else + high = j - 1; + } else if (ino dirarray[j].ino) + low = j + 1; + else + high = j - 1; + } + + if (low dirarray_cnt) { + memmove(dirarray + low + 1, dirarray + low, + (dirarray_cnt - low) * sizeof dirarray[0]); + } + dirarray[low].dev = dev; + dirarray[low].ino = ino; + dirarray_cnt++; + + return False; +} + /** * Create a file_struct for a named file by reading its stat() * information and performing extensive checks against global @@ -802,9 +839,14 @@ struct file_struct *make_file(char *fnam if (exclude_level == NO_EXCLUDES) goto skip_excludes; - if (S_ISDIR(st.st_mode) !recurse !files_from) { - rprintf(FINFO, skipping directory %s\n, thisname); - return NULL; + if (S_ISDIR(st.st_mode)) { + if (!recurse !files_from) { + rprintf(FINFO, skipping directory %s\n, thisname); + return NULL; + } + if ((keep_dirlinks || copy_links) +saw_dir(st.st_dev, st.st_ino)) + return NULL; } /* We only care about directories because we need to avoid recursing -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: bugfix: indeterministic file choice from multiple sources
On Wed 25 Aug 2004, Dirk Pape wrote: some time ago I reported a bug, where we saw indeterministic behaviour of rsync (all versions since 2.5), when having the same file appearing in multiple sources. Sometimes the file in the first source was copied, other times the file was copied from one of the other sources. [...] Since in our scenario using rsync we rely on deterministic behaviour, we What I'm wondering is why it's a problem that the same file is randomly copied from one of a number of sources, if indeed it is the same file. The resulting destination will be the same, right? Paul Slootman -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Using --keep-dirlinks : recursive symlinks problem
Seems fine, using inodes to sort out duplicates is a great idea. I suppose using hashes won't give much speedup here, since quantity of directories is not known beforehand. I will apply and test the patch today, thanks! Wayne Davison wrote: On Fri, Aug 20, 2004 at 04:50:45PM +0400, Ivan S. Manida wrote: Or please kick me in the right direction for a workaround which would make --keep-dirlinks consider sane symlinks only. Seems like the only good solution for this is to keep track of the device and inode of all the dirs we visit so that we can eliminate all duplicate directories. Attached is a patch that does this using a simple binary insertion sort. Very minimally tested. Thoughts? Optimizations? ..wayne.. --- flist.c 12 Aug 2004 18:20:07 - 1.236 +++ flist.c 25 Aug 2004 07:27:12 - @@ -724,6 +724,43 @@ void receive_file_entry(struct file_stru } +static BOOL saw_dir(dev_t dev, ino_t ino) +{ + static struct dirinfo { dev_t dev; ino_t ino; } *dirarray; + static int dirarray_cnt, dirarray_size; + int low, high; + + if (dirarray_cnt == dirarray_size) { + dirarray = realloc_array(dirarray, struct dirinfo, + dirarray_size += 4096); + } + + for (low = 0, high = dirarray_cnt - 1; low = high; ) { + int j = (low + high) / 2; + if (ino == dirarray[j].ino) { + if (dev == dirarray[j].dev) +return True; + if (dev dirarray[j].dev) +low = j + 1; + else +high = j - 1; + } else if (ino dirarray[j].ino) + low = j + 1; + else + high = j - 1; + } + + if (low dirarray_cnt) { + memmove(dirarray + low + 1, dirarray + low, + (dirarray_cnt - low) * sizeof dirarray[0]); + } + dirarray[low].dev = dev; + dirarray[low].ino = ino; + dirarray_cnt++; + + return False; +} + /** * Create a file_struct for a named file by reading its stat() * information and performing extensive checks against global @@ -802,9 +839,14 @@ struct file_struct *make_file(char *fnam if (exclude_level == NO_EXCLUDES) goto skip_excludes; - if (S_ISDIR(st.st_mode) !recurse !files_from) { - rprintf(FINFO, skipping directory %s\n, thisname); - return NULL; + if (S_ISDIR(st.st_mode)) { + if (!recurse !files_from) { + rprintf(FINFO, skipping directory %s\n, thisname); + return NULL; + } + if ((keep_dirlinks || copy_links) + saw_dir(st.st_dev, st.st_ino)) + return NULL; } /* We only care about directories because we need to avoid recursing -- Ivan S. Manida, cdev/buildmaster Sun SPb: 33033, Bld. 1, Room 217 -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: bugfix: indeterministic file choice from multiple sources
Hallo Paul, --Am Mittwoch, 25. August 2004 10:03 Uhr +0200 schrieb Paul Slootman [EMAIL PROTECTED]: What I'm wondering is why it's a problem that the same file is randomly copied from one of a number of sources, if indeed it is the same file. The resulting destination will be the same, right? No, it is not randomly (so I lied, when I said indeterministic), but see the following scenario for a command line rsync -r src1/ src2/ src3/ target/ where the dirs src1, scr2 and src3 consist of files with the same relative patch (e.g. dir1/foo/bar and dir2/foo/bar) but with *different content*. It now depends on the names and numbers of other files in dir1 .. dir3, whether dir1/foo/bar or dir2/foo/bar will be copied into target/foo/bar. Dirk. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: bugfix: indeterministic file choice from multiple sources
On Wed 25 Aug 2004, Dirk Pape wrote: --Am Mittwoch, 25. August 2004 10:03 Uhr +0200 schrieb Paul Slootman [EMAIL PROTECTED]: What I'm wondering is why it's a problem that the same file is randomly copied from one of a number of sources, if indeed it is the same file. The resulting destination will be the same, right? No, it is not randomly (so I lied, when I said indeterministic), but see the following scenario for a command line rsync -r src1/ src2/ src3/ target/ where the dirs src1, scr2 and src3 consist of files with the same relative patch (e.g. dir1/foo/bar and dir2/foo/bar) but with *different content*. Ah, so you were also lying when you said that the same file existed at more than one source; the fileNAME is the same, but the file itself is different. It now depends on the names and numbers of other files in dir1 .. dir3, whether dir1/foo/bar or dir2/foo/bar will be copied into target/foo/bar. I'm wondering whether this is a sane setup / config, and perhaps solved by some other means... In the meantime I'm curious about the relative memory usage of qsort vs. mergesort. I'd hate rsync's memory usage to go up again. Paul Slootman -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: bugfix: indeterministic file choice from multiple sources
On Wed, Aug 25, 2004 at 08:44:15AM +0200, Dirk Pape wrote: Since in our scenario using rsync we rely on deterministic behaviour What would you think of a tie-break for identical names that depended on some other attribute of the files? Such as newest file wins? That would be quite easy to add and would be deterministic, but perhaps not in the way you want. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
[Bug 1660] --exclude option causes rsync to fail
https://bugzilla.samba.org/show_bug.cgi?id=1660 --- Additional Comments From [EMAIL PROTECTED] 2004-08-25 14:03 --- Wayne, This is the way BackupPC (using File::RsyncP) invokes the client side of rsync. This is perl code that emulates one side of the rsync connection, and the real rsync is used as the server side with the --server option. I agree that using the --server option manually in this manner is subject to change, so I can't reasonably expect backward compatibility for an internal feature. So it looks like I should update File::RsyncP to send the exclude arguments through the socket instead of arglist. Up until 2.6.2 the arglist method worked fine. Craig -- Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the QA contact for the bug, or are watching the QA contact. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: Problem related to time-stamp
Hi Wayne, Sorry to disturb u again, but now i have one more very small problem. we need not to supply the password on command line everytime, when we will use rsync for transferring files from one m/c to another,i want to fix login name as well as passord for a particular m/c. I try out the option of seting the environment variable RSYNC_PASSORD and --password-file=FILE . but both of them is not solve my purpose. or may be i m not using them in proper way so kindly gave me ur suggestions regarding the login name and password. Thanks Shubhra --- Wayne Davison [EMAIL PROTECTED] wrote: On Fri, Jul 30, 2004 at 02:49:16AM -0700, shubhra dutt wrote: when i rsync them to remote m/c the time-stamp of the file on remote m/c (which i transfered from my m/c) will change. Use the -t option to preserve the timestamp from the original and allow rsync to avoid sending files that are already up-to-date. If you can't do that, your only other option is to use -c (which will be quite a bit slower). ..wayne.. __ Do you Yahoo!? Yahoo! Mail is new and improved - Check it out! http://promotions.yahoo.com/new_mail -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
How to fix the login name and password for a particular remote m/c
Hi Wayne, Sorry to disturb u again, but now i have one more very small problem. we need not to supply the password on command line everytime, when we will use rsync for transferring files from one m/c to another,i want to fix login name as well as passord for a particular m/c. I try out the option of seting the environment variable RSYNC_PASSORD and --password-file=FILE . but both of them is not solve my purpose. or may be i m not using them in proper way so kindly gave me ur suggestions regarding the login name and password. Thanks Shubhra --- Wayne Davison [EMAIL PROTECTED] wrote: On Fri, Jul 30, 2004 at 02:49:16AM -0700, shubhra dutt wrote: when i rsync them to remote m/c the time-stamp of the file on remote m/c (which i transfered from my m/c) will change. Use the -t option to preserve the timestamp from the original and allow rsync to avoid sending files that are already up-to-date. If you can't do that, your only other option is to use -c (which will be quite a bit slower). ..wayne.. __ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html