It's usually just a general sense that filtering is no longer as effective as it once was, often combined with a (relatively) high ratio of spam to ham in the training database and/or a relatively high number of messages in the database. For instance, I just tossed my training database a couple of days ago, when it had a few hundred messages and a spam:ham ratio of about 3:1. I'm now getting filtering results that are almost as good with 7 ham and 5 spam in the database, and I expect results will improve to the point that I think I'm ahead within a day or two.
All very subjective, seat-of-the-pants, and possibly delusional. If I had the time, interest, and expertise, it might be interesting to quantify my results, but I'm just an Outlook plug-in user trying to make my mail stream manageable. I've managed to keep myself convinced that this approach is working for several years now, though. As for my perverse pleasure, that stems from marveling at how quickly SpamBayes learns, from keeping things lean, and from the sense that I'm spending less time manually classifying messages, once I reach that point. I wasn't necessarily recommending that Ram trash his training periodically, though. I just wanted to make the point that a small set of really good data may be better than a big set of data of questionable quality, and to suggest that he try incremental training before trying to figure out how to turn his existing set of messages into an effective training corpus. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Abrahams Sent: Wednesday, January 09, 2008 7:04 PM To: [email protected] Subject: Re: [Spambayes] Does SpamBayes support automatic selective training? on Thu Jan 03 2008, "Jesse Pelton" <jsp-AT-PKC.com> wrote: > Do you have reason to believe that incremental training on messages that > you're currently receiving would be ineffective? I retrain from scratch > periodically, and I generally find that a remarkably small corpus (maybe > a total of couple of dozen messages trained) is effective. I retrain in > part because I suspect that the content of spam that I receive changes > over time, so training performed on messages from the distant past (say, > six months ago) may be irrelevant or worse for my current message > stream. > > One of the counter-intuitive things about SpamBayes is how little data > it needs to go on. This makes retraining fast, easy, and (for me, at > least) perversely rewarding. Sorry if this sounds combative; I'm really just trying to understand. What makes you decide to retrain, if it's working so well? Do you just do it prophylactically, like brushing your teeth? If so, then you probably don't see it improving things much (like brushing your teeth). In that case, what makes it rewarding? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
