On 5 Aug 2010, at 20:13, Matthew Kitchin (public/usenet) wrote:

> On 8/5/2010 2:10 PM, Noel Jones wrote:
>> 
>> Use your database to generate rules for clamav.  You could even remove
>> the stock clamav rules if you want.  Matching the body for 70,000
>> names would probably take less than 0.1 seconds.
> That sounds like a really good idea. I do use ClamAV but have never written 
> any rules of my own. Thanks for the tip!

I'd set it up to check for surnames from the list in groups first, then if it 
matches one of those look for the various permutations of the full names that 
correspond to each set. I'm thinking of these in terms of calling out from 
Exim's acl_check_data section, using various database dirs depending on the 
rule set (like the Bayes filter), but there are other ways of achieving the 
same with. That ought to reduce the amount of work per message for those that 
will be let through. You'd have to experiment to find the best group size, it 
would depend on how many distinct surnames there are in your set, as well as 
the callout cost relative to the time for each expression. That would also give 
you a good shot at identifying J. Smith as well, for example.

Reply via email to