I'm parsing a simple, domain specific scripting language that has commands like the following: *page, *title, *text, *footer, etc. There are about 100 of these '*' commands. When we see a command that we don't recognize, I would like to find the closest match possible (from a list of all legal commands) and include this suggestion in our diagnostic output.
I'm not sure what you would call the type of algorithm I'm looking for: closest matching string or auto-correct? Any suggestions on algorithms or python libraries that would help me do what I'm looking for? Here's my short-list of ideas based on my research so far: - Soundex - Lawrence Philips' Metaphone Algorithm (from aspell?) - Edit Distance Algorithm - Levenstein Word Distance Any suggestions, comments on the above techniques, or ideas on a simpler algorithm for finding close string matches based on a list, dict, or set of possible strings? Thank you, Malcolm
-- http://mail.python.org/mailman/listinfo/python-list