On Tue, November 27, 2007 15:01, Thomas Hruska wrote:
> I've been thinking about how I'm going to balance my ham (10,641
> messages) and spam (60,230 messages).  What I plan on doing is
> discarding spam and then just train on ham until they are balanced.  It
> will take a while because the incoming ratio of ham to spam is fairly
> ridiculous.
>
> While this approach will work, I'm thinking it would be nice for
> Spambayes to automatically balance itself when some configurable
> percentage is hit on either end of the spectrum so that I wouldn't have
> to worry about it.  There will ALWAYS be more spam than ham.  Most users
> of Spambayes think like me:  Continue training on the spam in the hope
> that it will completely go away.  Why concern users with balance issues
> that should be, IMO, handled automatically?

I think it is easier to acknowledge that spam won't go away, that no
solution is perfect, and that it is less work to retrain from scratch when
your ham/spam ratio becomes ridiculous.

-- 
Amedee Van Gasse
[EMAIL PROTECTED]

_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to