Hi Greg, Instead of using the filename to determine duplicate audio files have you considered using an audio fingerprint?
I have used this software in the past to automatically tag my music. https://picard.musicbrainz.org/ “Picard uses AcoustID audio fingerprints, allowing files to be identified by the actual music, even if they have no metadata” Apparently it uses http://acoustid.org/ which is an open source library. Regards Adrian Halid From: [email protected] [mailto:[email protected]] On Behalf Of Greg Keogh Sent: Saturday, 29 November 2014 6:46 AM To: ozDotNet Subject: Duplicate matching Folks, I was about this write some utility code to search through my 20,000 audio files looking for probable duplicates. I say "probable" because I found file names like these: Lovelock - Trumpet Concerto (SSO Concert).mp3 Trumpet Concerto (William Lovelock).mp3 There are many other duplicates with rearranged, abbreviated or misspelt words in the names. I was about to click "New Project" and start typing but I suddenly realised I had no idea what algorithm to use to find probable duplicates and rate them. Has anyone done this sort of thing before or know where to find a description of a suitable algorithm? Greg K
