Hi Greg,

Instead of using the filename to determine duplicate audio files have you 
considered using an audio fingerprint?

I have used this software in the past to automatically tag my music.
https://picard.musicbrainz.org/
“Picard uses AcoustID audio fingerprints, allowing files to be identified by 
the actual music, even if they have no metadata”

Apparently it uses http://acoustid.org/ which is an open source library.


Regards

Adrian Halid


From: [email protected] [mailto:[email protected]] On 
Behalf Of Greg Keogh
Sent: Saturday, 29 November 2014 6:46 AM
To: ozDotNet
Subject: Duplicate matching

Folks, I was about this write some utility code to search through my 20,000 
audio files looking for probable duplicates. I say "probable" because I found 
file names like these:

Lovelock - Trumpet Concerto (SSO Concert).mp3
Trumpet Concerto (William Lovelock).mp3

There are many other duplicates with rearranged, abbreviated or misspelt words 
in the names. I was about to click "New Project" and start typing but I 
suddenly realised I had no idea what algorithm to use to find probable 
duplicates and rate them. Has anyone done this sort of thing before or know 
where to find a description of a suitable algorithm?

Greg K

Reply via email to