RE: Duplicate matching

2014-11-30 Thread Adrian Halid
Greg Keogh Sent: Monday, 1 December 2014 8:35 AM To: ozDotNet Subject: Re: Duplicate matching Instead of using the filename to determine duplicate audio files have you considered using an audio fingerprint? ... Apparently it uses http://acoustid.org/ which is an open source library. This is an

Re: Duplicate matching

2014-11-30 Thread Greg Keogh
> > Instead of using the filename to determine duplicate audio files have you > considered using an audio fingerprint? ... Apparently it uses > http://acoustid.org/ which is an open source library. > This is an interesting lateral-thinking idea. That's an ambitious and scientifically interesting p

RE: Duplicate matching

2014-11-30 Thread Adrian Halid
: ozDotNet Subject: Duplicate matching Folks, I was about this write some utility code to search through my 20,000 audio files looking for probable duplicates. I say "probable" because I found file names like these: Lovelock - Trumpet Concerto (SSO Concert).mp3 Trumpet Concerto (Willia

Re: Duplicate matching

2014-11-28 Thread Stephen Price
dotnet.com ] *On Behalf Of *Stephen Price *Sent:* Saturday, November 29, 2014 12:30 PM *To:* ozDotNet *Subject:* Re: Duplicate matching Am curious, is the idea of the exercise to write your own code to solve the problem, or to solve the problem? I've used Treesize pro to find file duplicates in

RE: Duplicate matching

2014-11-28 Thread ILT (O)
PM To: ozDotNet Subject: Re: Duplicate matching Am curious, is the idea of the exercise to write your own code to solve the problem, or to solve the problem? I've used Treesize pro to find file duplicates in the past. Also have used Directory Opus to find duplicates. Great for fi

Re: Duplicate matching

2014-11-28 Thread Greg Keogh
Hi Stephen, I wrote a utility in Framework 1.0 that finds duplicate files by content (builds a dictionary of checksums). In this case the files with "similar" names might be the same recording at different bitrates, making them binary different. So it's a bit fuzzy what I'm looking for. Off the cuf

Re: Duplicate matching

2014-11-28 Thread Stephen Price
Am curious, is the idea of the exercise to write your own code to solve the problem, or to solve the problem? I've used Treesize pro to find file duplicates in the past. Also have used Directory Opus to find duplicates. Great for finding identical files with different names. Probably won't help if

Re: Duplicate matching

2014-11-28 Thread Greg Keogh
Thanks Greg H, the "weighting" is a very interesting idea. I'm running some simple experiments now with a word list and an inverted list of file names, just to help me picture the problem in my head. The problem with a weighting comparison is that I don't know what to compare with what, comparing 2

Re: Duplicate matching

2014-11-28 Thread Greg Harris
Hi Greg, I should look at my code before I write comments from memory... The result is a *double *value being the sum of: · number of times the same letter appears in both strings · 10 times the number of times the same two letters appears in both strings · 100 times t

Re: Duplicate matching

2014-11-28 Thread Greg Harris
Hi Greg, Please find following what I have used in the past. It is very expensive, but I can not see a better way of doing it. It returns an integer which is the sum of: - number of times the same letter appears in both strings - 10 times the number of times the same two letters appears in

Duplicate matching

2014-11-28 Thread Greg Keogh
Folks, I was about this write some utility code to search through my 20,000 audio files looking for probable duplicates. I say "probable" because I found file names like these: Lovelock - Trumpet Concerto (SSO Concert).mp3 Trumpet Concerto (William Lovelock).mp3 There are many other duplicates wi