On Sat, May 08, 2004, Angus Lees wrote: > .. so with all that manual spam/ham classification/archiving, is there > actually any point running an "automatic" spam filter anymore?
Well, depends on what you mean by "all that". About three times a week, a mail ends up in the wrong folder. (That's an error rate of about 0.15%.) I move those three mails to the right folder so that they get learned correctly. Once a week a cronjob fires and learns whatever happens to be in my mail folders at the time. I'm happy with manually moving three mails a week. I spend more time 'training' procmail than I do training my Bayesian filter. (Please do not wave the magical procmail rule at me, because the Linguist List don't put the right headers in their mails and therefore it is not the solution to the problem I'm thinking of.) The time investment is considerably less than "all that" manual spam deleting, for example. > From what I can see any spam filter that needs training is missing the > point - but I've never actually run any of the Bayesian filters so its > purely ignorant prejudice ;) Well, it depends on what the point is. If the point is "it is easy to tell spam from non spam with rules that are already in existence" then contribute your rules to the SpamAssassin project because many people are finding that their rules degrade in effectiveness over time. SA, untrained, would miss about 15% of the spam I currently receive. If the point is "it should be possible to tell spam from non spam with rules with an acceptable error rate that will not degrade for a long period of time" you're probably right, but my suspicion is that coming up with those rules is like a lot of natural language problems: hard. If the point is "spam just doesn't annoy me that much, and I'd rather just delete the stuff than spend more than 1 minute setting up a filter" then we're different. -Mary -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
