Hi,

On Wed, 26 Oct 2005, Chris Shoemaker wrote:
On Wed, Oct 26, 2005 at 03:02:51PM +0200, Tomasz Chmielewski wrote:

I use rsync for backing up user data, profiles, important network shares
etc. (from several locations over WAN).

Overall it works flawlessly, as it transfers only changes, but sometimes
there are some serious hiccups.

Suppose this scenario, suppose it's 1 GB of files:

user shares:

/home/joe/data/file1
              /file2
              /...
              /file1000

Now the user _moves_ that data to some other folder:

/home/joe/WAN_goes_crazy/file1
                          /file2
                          /...
                          /file1000

...and we start a backup process.

rsync will first transfer data from "/home/joe/WAN_goes_crazy/file...",
and then deletes "/home/joe/data/data...".

Basically, this is how rsync works, but in the end, we transfer 1 GB of
files over WAN that we already have locally - the only thing that
changed was the folder where that data is.

Is there some workaround for this (some intelligent script etc.)?

ISTM it would be quite useful to make rsync "rename-aware".  Caveat: I
haven't hacked on rsync for quite a while, so my understand may be
wrong or outdated.  But, I think this could be implemented thusly:

You'd want to make this optional, say --detect-renames, because it
does incur an extra processing cost.  That option should imply at
least, --checksum and --delete-after if --delete at all.  Then you
just need the generator to be slightly more clever.  For each file on
the sender which is *missing* from the receiver, it needs to search
the checksums of all of receiver's existing files for a checksum
match.  If it finds a match, it can simply use that matched file and
either copy or move it to the new filename.  Then that file just gets
skipped.

I don't think this would require any changes to sender, receiver or
protocol.  What I described would only handle
rename-without-modification, but it's cost is not very high.  I think
it's O(N*M), N=# of files on sender that are missing on receiver, M=#
of files on sender.  That's the cost over and above whatever
--checksum costs.

I don't see how rename-with-modification could be handled efficiently,
though.  Better not to go there.

If nobody says I'm way off base here, I might be inspired to try to
implement this.  Unless someone else has the time and inclination...

The first pass of "rename-without-modification" could even be much easier:
size and timestamp should match.

Cheers -e
--
Eberhard Moenkeberg ([EMAIL PROTECTED], [EMAIL PROTECTED])
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to