Re: great feature idea (well, hopefully)

2015-02-12 Thread Joe
I haven't used it yet, but take a look at --fuzzy specified once or
twice. I think it does what you want or at least something very similar.

Joe

On 02/11/2015 09:03 AM, QUBE RUBBIK wrote:
 Hello

 I was just thinking about a killer feature for rsync, the ability to
 detect files name changes or move within the source and destination.
 At this time rsync has to re-transfer a file if it has been renamed or
 moved inside a subfolder, with a heavy waste of ressources and bandwidth.

 It could be smarter :
 with a --smart switch, rsync could take a hash of every file within
 the source and destination BEFORE TRANSFERING,
 then for existing (matching hash) files, it only needs to alter
 metadata (name, location, chmod etc...) saving plenty of bandwidth

 Okay destination has to handle this, I expect the rsync daemon has to
 handle server side file hashing.

 We would have a clever tool to replicate data who only been
 reorganised with no changes on the files themselves.
 No need to resync the whole structure if you added a dir in the path,
 or someone renamed this particular heavy file

 this may save big data on automatic backups, ftp mirrors etc...


 What do you think about it?

 --smart ?



-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: great feature idea (well, hopefully)

2015-02-12 Thread Matthias Schniedermeyer
On 11.02.2015 14:03, QUBE RUBBIK wrote:
 Hello
 
 I was just thinking about a killer feature for rsync, the ability to detect 
 files name changes or move within the source and destination.
 At this time rsync has to re-transfer a file if it has been renamed or moved 
 inside a subfolder, with a heavy waste of ressources and bandwidth.
 
 It could be smarter :
 with a --smart switch, rsync could take a hash of every file within the 
 source and destination BEFORE TRANSFERING, 
 then for existing (matching hash) files, it only needs to alter metadata 
 (name, location, chmod etc...) saving plenty of bandwidth

Imagine doing that for a couple GB of data. The hashing might take 
longer than the time saved coping it.
This would only work with a persistence layer that remembers the hashes 
of unchanged files. This has been a topic in the past, altough i don't 
remember the details. (And i'm to lazy to google for it.)
Otherwise the only time it really saves time is when you have really 
asynchronous bandwithes:
Fast local access on both sides (to create the hashes), terrible 
bandwith on the link inbetween (for the coping of new/changed files)

 Okay destination has to handle this, I expect the rsync daemon has to handle 
 server side file hashing.
 
 We would have a clever tool to replicate data who only been reorganised with 
 no changes on the files themselves.
 No need to resync the whole structure if you added a dir in the path, or 
 someone renamed this particular heavy file
 
 this may save big data on automatic backups, ftp mirrors etc...
 
 
 What do you think about it?

The 'workaround' i personally use are hardlinks. Just hardlink all files 
into a directory that sorts alphabetically before everything else, for 
me personally i use a '.z'-directory in the root of directory i treat 
that way.
That reason for that is rsync has to work through that directory first, 
otherwise it wouldn't work like intended.

After that you can move around the files and when you:
rsync ... -H --delete ... ...
rsync just deletes and re-hardlinks the moved file(s).

If you remove a file:
find .z -type f -links 1 -delete
removes the 'dangling' file(s) with only 1 link remaining.
(And in the meantime you have a backup, in case you accidentally deleted 
a file.)

You would also need to make plans for maintaing the .z-directory. 
Initial creating, adding new files, can files change? ...

The solution has some caveats, like maintaining the .z-directory, but it 
works fine for me.




-- 

Matthias
-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html