Hi All,
Are there plans to implement fuzzy match or similar algorithms to match
files moved/renamed files?
With scenario where large files are renamed or moved between folders
rdiff-backup treats these files as new ones and as result transfers
large amounts of data and takes a lot of data to store diffs, i.e. for
12 weeks or so, whilst the data is in fact the very same.
What I would like to suggest is:
in case of discovering new file, calculate checksum, check if the
checksum exists already in destination folder (under any subfolder):
a) if the file does exist in destination folder (different file
name/path) and the file name/path does not exist anymore in the source,
simply rename/move file
b) if the file does exist in destination folder (different file
name/path) and the file name/path _does_ exist in the source we have
situation of duplicate of the file and can either do hardlinking or
create local copy.
Above approach would solve the problem of transmitting and storing a lot
of data for the same files being moved between folders.
The deletions should be done at the very end of the process as by that
we could re-use files already store.
The diff between backups would then store only differences again and not
full copies of the files.
Does this sound like something which could be implemented in the near
future?
This might be not the best place to post this question, but if there is
a better backup solution handling situations like this, please let me
know too. I'm looking to keep all the goodies of rdiff-backup hence
rsync with fuzzy option is not a way to go for me.
Thanks,
David
_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki