On Sat, 2004-05-08 at 10:08, Angus Lees wrote: > From what I can see any spam filter that needs training is missing the > point
If you use a model where you train your filter with every single filter, then yes, your viewpoint that "needs training is missing the point" makes sense. But most people I work with only train their spam filters when something is mis-classified. This has the effect of keeping the database size down a touch, and also means that the mechanics of "remove this message from the database of spam words and add it to the database of white words" is unnecessary. > - but I've never actually run any of the Bayesian filters so its > purely ignorant prejudice ;) My scheme is this: I run bogofilter on the server where my mail is aggregated as the final step before it gets sorted into folders. Positives are is sent to a "ProbableSpam" folder. I have two other relevant folders: Spam and NotSpam. I have a cron job that periodically (say, every 6 hours) takes any messages from those folders and feeds them into filter with the appropriate marking, then empties the [Not]Spam folder. If something is misclassified, [ie a spam message shows up in my InBox] all I do is move (or copy) it to the Spam or NotSpam folder, and forget about it. [It's IMAP, so it goes to the server] A few hours later, the script will come along and train accordingly. This may or may not meet your definition of "acceptable usage model" but <shrug> works for us. AfC -- Andrew Frederick Cowie Operational Dynamics Consulting Pty Ltd Australia: +61 2 9977 6866 North America: +1 646 472 5054 http://www.operationaldynamics.com/ -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
