On 5 Aug 2010, at 20:13, Matthew Kitchin (public/usenet) wrote: > On 8/5/2010 2:10 PM, Noel Jones wrote: >> >> Use your database to generate rules for clamav. You could even remove >> the stock clamav rules if you want. Matching the body for 70,000 >> names would probably take less than 0.1 seconds. > That sounds like a really good idea. I do use ClamAV but have never written > any rules of my own. Thanks for the tip!
I'd set it up to check for surnames from the list in groups first, then if it matches one of those look for the various permutations of the full names that correspond to each set. I'm thinking of these in terms of calling out from Exim's acl_check_data section, using various database dirs depending on the rule set (like the Bayes filter), but there are other ways of achieving the same with. That ought to reduce the amount of work per message for those that will be let through. You'd have to experiment to find the best group size, it would depend on how many distinct surnames there are in your set, as well as the callout cost relative to the time for each expression. That would also give you a good shot at identifying J. Smith as well, for example.