Find closest matching string based on collection of strings in list/dict/set

python Tue, 31 Aug 2010 08:45:24 -0700

I'm parsing a simple, domain specific scripting language that has
commands like the following: *page, *title, *text, *footer, etc.
There are about 100 of these '*' commands. When we see a command
that we don't recognize, I would like to find the closest match
possible (from a list of all legal commands) and include this
suggestion in our diagnostic output.


I'm not sure what you would call the type of algorithm I'm
looking for: closest matching string or auto-correct?

Any suggestions on algorithms or python libraries that would help
me do what I'm looking for?

Here's my short-list of ideas based on my research so far:
- Soundex
- Lawrence Philips' Metaphone Algorithm (from aspell?)
- Edit Distance Algorithm
- Levenstein Word Distance

Any suggestions, comments on the above techniques, or ideas on a
simpler algorithm for finding close string matches based on a
list, dict, or set of possible strings?

Thank you,
Malcolm

-- 
http://mail.python.org/mailman/listinfo/python-list

Find closest matching string based on collection of strings in list/dict/set

Reply via email to