I've been thinking about how I'm going to balance my ham (10,641 messages) and spam (60,230 messages). What I plan on doing is discarding spam and then just train on ham until they are balanced. It will take a while because the incoming ratio of ham to spam is fairly ridiculous.
While this approach will work, I'm thinking it would be nice for Spambayes to automatically balance itself when some configurable percentage is hit on either end of the spectrum so that I wouldn't have to worry about it. There will ALWAYS be more spam than ham. Most users of Spambayes think like me: Continue training on the spam in the hope that it will completely go away. Why concern users with balance issues that should be, IMO, handled automatically? Another option could be to calculate the ratio of ham to spam and alter the "strength" of the ham/spam clues according to the ratio. However, this is probably a bad idea. I'm running Spambayes 1.0.4. -- Thomas Hruska CubicleSoft President Ph: 517-803-4197 *NEW* MyTaskFocus 1.1 Get on task. Stay on task. http://www.CubicleSoft.com/MyTaskFocus/ _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
