On 06/11/2014 10:35 AM, Michael Torrie wrote: > On 06/11/2014 06:23 AM, BrJohan wrote: >> For some genealogical purposes I consider using Python's re module. >> >> Rather many names can be spelled in a number of similar ways, and in >> order to match names even if they are spelled differently, I will build >> regular expressions, each of which is supposed to match a number of >> similar names. > You might want to search for fuzzy matching algorithms. Years ago, there > was an algorithm called soundex that would generate fuzzy fingerprints > for words that would hide differences in spelling, etc. Unfortunately > such an algorithm would be language dependent. The problem you are > trying to solve is one of those very hard problems in computers and math. >
Soundex is actually not horrible, but it is definitely only for English names. Newer variants of Metaphone (http://en.wikipedia.org/wiki/Metaphone) are significantly better, and support quite a few other languages. Either one would most likely be better than the regex approach. Side note: if your data happens to be in MySQL then it has a builtin "sounds_like()" function that compares strings using soundex. -- https://mail.python.org/mailman/listinfo/python-list