On Thu, 13 Jan 2005 18:00:42 +1300, "Tony Meyer" <[EMAIL PROTECTED]> wrote:
>> I was doing a kind of manual "train to exhaustion", and the >> other thing I noticed was that the spam took a lot more >> training to make classification accurate (currently 82 ham : >> 409 spam, out of a total training set of 644 : 1414). I guess >> this simply means that my spam is a lot less consistent than my ham. > >With 'classic' train to exhaustion, the database is kept exactly balanced, I >believe. How well is your system working for you? Erm, not all that well. :| My incoming mail is very unbalanced - 17:1 spam:ham since I started the training - which can't help, but so far I have 18% unsure spam and 3% false negatives. No mistakes on ham though; none scored higher than 0.5%. Given that, I suppose I could simply mess with the thresholds. -- Mat. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
