This is one for the training gurus. You can find a discussion of various training approaches on the SpamBayes wiki (http://www.entrian.com/sbwiki/TrainingIdeas).
 
That said, I'll put my oar in. In general, the recommendation of the gurus is along the lines of "don't worry, be happy:" as long as you're getting satisfactory results, just use the training buttons to correct classification errors. The bottom line is the quality of the results you're getting; the suggestion to keep the ham:spam ratio close to 1 is a guideline that seems to help achieve that result. I follow that approach, and when I notice that I'm getting unsatisfactory results over a period of time, I just discard my training database and start over. SpamBayes learns very quickly, so I don't find it worthwhile to try to tune the database over time.
 
Another thing to look at is the threshold scores for possible and certain spam. I've dropped my certain spam threshold somewhat as I've become more confident in my training data (it's now .70). This means fewer possible spam messages that I then train as spam, which reduces the ham:spam imbalance. I'm currently getting good results (>95% correctly classified) with 53 ham and 171 spam trained on.


From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gil Hurlbut
Sent: Monday, April 24, 2006 4:35 PM
To: spambayes@python.org
Subject: Re: [Spambayes] Incremental Training for ham in Outlook Plugin?

The question addresses the fact that SpamBayes is far better at classifying ham once it is trained than it is in keeping up with classifying new spam. I find it necessary to remove many spam messages until I get to the point where the Manager has far more spam than ham. Until I hear a recommendation differently, I’m going to get back to a balance by moving known ham to my Unsure folder and click on Recover from Spam” to do the incremental training.

 

_______________________________________________
SpamBayes@python.org
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to