I am developing a list of 3 character strings like this: and bra cam dom emi mar smi ...
The goal of the list is to have enough strings to identify files that may contain the names of people. Missing a name in a file is unacceptable. For example, the string 'mar' would get marc, mark, mary, maria... 'smi' would get smith, smiley, smit, etc. False positives are OK (getting common words instead of people's names is OK). I may end up with a thousand or so of these 3 character strings. Is that too much for an re.compile to handle? Also, is this a bad way to approach this problem? Any ideas for improvement are welcome! I can provide more info off-list for those who would like. Thank you for your time, Brad -- http://mail.python.org/mailman/listinfo/python-list