Folks, I was about this write some utility code to search through my 20,000 audio files looking for probable duplicates. I say "probable" because I found file names like these:
Lovelock - Trumpet Concerto (SSO Concert).mp3 Trumpet Concerto (William Lovelock).mp3 There are many other duplicates with rearranged, abbreviated or misspelt words in the names. I was about to click "New Project" and start typing but I suddenly realised I had no idea what algorithm to use to find probable duplicates and rate them. Has anyone done this sort of thing before or know where to find a description of a suitable algorithm? *Greg K*
