Hello, now that Dragonfly's HAMMER has got deduplication I ask myself if there is a simple way to identify "pairs" or groups of files which share a lot of data, i.e. are mostly identical.
I have a rather large repository of downloaded pictures, which contain a lot of dupes in multiple locations. I have no problems finding those given some time and a shell prompt. I'm interested in identifying broken files. Broken in the sense that A is an incomplete version of B (some bytes missing), or B a damaged version of A (some additional bytes at the end). Is there a way to get to something like this: "File A shares 1234 (98.3%) data blocks with file B" "File A shares xxxx (xx.x%) data blocks with file C" Getting a step closer helps too. Thanks for any insights. Regards Thomas
