On 06/11/2014 10:35 AM, Michael Torrie wrote:
> On 06/11/2014 06:23 AM, BrJohan wrote:
>> For some genealogical purposes I consider using Python's re module.
>>
>> Rather many names can be spelled in a number of similar ways, and in 
>> order to match names even if they are spelled differently, I will build 
>> regular expressions, each of which is supposed to match  a number of 
>> similar names.
> You might want to search for fuzzy matching algorithms. Years ago, there
> was an algorithm called soundex that would generate fuzzy fingerprints
> for words that would hide differences in spelling, etc.  Unfortunately
> such an algorithm would be language dependent.  The problem you are
> trying to solve is one of those very hard problems in computers and math.
>

Soundex is actually not horrible, but it is definitely only for English
names. Newer variants of Metaphone
(http://en.wikipedia.org/wiki/Metaphone) are significantly better, and
support quite a few other languages.  Either one would most likely be
better than the regex approach.

Side note: if your data happens to be in MySQL then it has a builtin
"sounds_like()" function that compares strings using soundex.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to