On Tue, November 27, 2007 15:01, Thomas Hruska wrote: > I've been thinking about how I'm going to balance my ham (10,641 > messages) and spam (60,230 messages). What I plan on doing is > discarding spam and then just train on ham until they are balanced. It > will take a while because the incoming ratio of ham to spam is fairly > ridiculous. > > While this approach will work, I'm thinking it would be nice for > Spambayes to automatically balance itself when some configurable > percentage is hit on either end of the spectrum so that I wouldn't have > to worry about it. There will ALWAYS be more spam than ham. Most users > of Spambayes think like me: Continue training on the spam in the hope > that it will completely go away. Why concern users with balance issues > that should be, IMO, handled automatically?
I think it is easier to acknowledge that spam won't go away, that no solution is perfect, and that it is less work to retrain from scratch when your ham/spam ratio becomes ridiculous. -- Amedee Van Gasse [EMAIL PROTECTED] _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
