Ming Zhang wrote:
On Fri, 2008-01-04 at 16:21 -0500, Boris Toloknov wrote:
Ming Zhang wrote:
On Fri, 2008-01-04 at 15:05 -0500, Boris Toloknov wrote:
Ming Zhang wrote:
On Fri, 2008-01-04 at 14:12 -0500, Boris Toloknov wrote:
Ming Zhang wrote:
On Thu, 2008-01-03 at 20:19 -0500, Boris Toloknov wrote:
Hi,
It seems that rsync transfers files whose names was changed or which
were moved to another directory since the previous synchronization. I
think that ability not to transfer (large) files which are present on
another computer would be very helpful. Right before rsync is going to
transfer some large file it could check if there some other files with
the same size ( and maybe the same mtime ) on the destination
computer. In case if the destination computer has such files then it
could be asked to find the file with given MD5. If it's found then
there is no need to transfer that file. Local copy/rename/move can be
performed instead.
let us say you have N files in one directory and you rename the
directory name. so for N files, u need to check destination side all M
files and see if it is the renamed one. so you do NxM comparison and
this is not scalable at all...
I think that a hash could be used instead of that. The destination
computer ( at least ) must has a list of all the files in the
destination directory. The key = size + mtime and value = pointer to
the file entry in the list. Actually for that operation it would be
better to have that list and hash on the sending computer.
rsync 3.0 introduce incremental scan to avoid the OOM issue, so hash
need to be optional as well... also i think this hash can be used to
detect hard link at same time. for normal use, it should be ok.
I agree that with incremental scan "move/rename" feature can be
optional. Anyway to minimize memory usage ( if it's necessary ) a
sorted list can be used instead of hash and a list of all files could
be stored in the temporary file with buffered access to it. In that
case the key = size + mtime, value = offset in the file with the list.
another issue is rsync need to build this list up front before handling
file transfer. this can take quite some time on a huge file system (when
i say huge, i mean the file system with 20-100m files)...

also rsync already have some rename detection. check command line option
please.
I don't mind to have "move/rename" detection as an optional feature
that is turned off by default. Actually that list doesn't have to have
all the files. The files with size < some configurable size ( for
example 100KB ) don't need to be in the list. So it's likely won't
take much memory and time ( for sorting ) even for huge systems.
Scanning of the file tree takes some time though. 1TB HDD filled up
with 100,000,000 files has average file size about 10KB.
I have 2.6.9 and didn't find any command line option for rename
detection. I just found that there is some patch "--detect-renamed".
But it seems that that patch doesn't detect files which were moved to
another directory. "News file" for 3.0.0pre7 doesn't have anything
about rename detection.

i must remember the feature because of this patch.

another way is to use inotify, generate a moved file list, pass list to
receiver side, and handle the list before running rsync.
Of course there are many ways to handle move/rename without rsync. However that isn't very easy and I think that "move/rename" detection would be helpful for many/most rsync users.

Boris

-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to