Hi Harald, On Mon, Dec 12, 2016 at 02:31:03PM +0100, Harald Dunkel wrote: > Have you considered to introduce a "deduplicate mode" for > rsync, replacing duplicate files in the destination directory > by hard links?
For a month now I have been successfully using the offline deduplication feature that is currently experimental in XFS to reduce the size of my rsnapshot backups. Some more info: http://strugglers.net/~andy/blog/2017/01/10/xfs-reflinks-and-deduplication/ rsync is hardlinking together files that do not change between two backup runs, but reflinks are also allowing me to deduplicate files that cycle between known content, also partially-identical files and identical regions across multiple different directories (so from different hosts, for example). At the moment it is saving me about 27% volume. This is of course totally dependent on what you are backing up. Also do note that examining the whole tree of files is really hard on the storage as it hits it with a large amount of random IO. Especially with slow rotational storage it may well be cheaper just to buy more capacity. Personally I am using SSD so the performance vs. capacity trade-off is different. Not speaking for the rsync developers but deduplicating all files within a directory would need rsync to read all files in a directory which is something it wouldn't normally do unless they are going to be the target for a file transfer. Since other utilities already exist for examining files and hardlinking dupes together, or indeed doing it inside the filesystem on a block/extent level basis, maybe it is not appropriate to put the feature inside rsync. Cheers, Andy -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html