On 06/11/2014 10:35 AM, Michael Torrie wrote:
> On 06/11/2014 06:23 AM, BrJohan wrote:
>> For some genealogical purposes I consider using Python's re module.
>> Rather many names can be spelled in a number of similar ways, and in
>> order to match names even if they are spelled differently, I will build
>> regular expressions, each of which is supposed to match a number of
>> similar names.
> You might want to search for fuzzy matching algorithms. Years ago, there
> was an algorithm called soundex that would generate fuzzy fingerprints
> for words that would hide differences in spelling, etc. Unfortunately
> such an algorithm would be language dependent. The problem you are
> trying to solve is one of those very hard problems in computers and math.
Soundex is actually not horrible, but it is definitely only for English
names. Newer variants of Metaphone
(http://en.wikipedia.org/wiki/Metaphone) are significantly better, and
support quite a few other languages. Either one would most likely be
better than the regex approach.
Side note: if your data happens to be in MySQL then it has a builtin
"sounds_like()" function that compares strings using soundex.