On Thu, 13 Jan 2005 18:00:42 +1300, "Tony Meyer" <[EMAIL PROTECTED]> wrote:

>> I was doing a kind of manual "train to exhaustion", and the 
>> other thing I noticed was that the spam took a lot more 
>> training to make classification accurate (currently 82 ham : 
>> 409 spam, out of a total training set of 644 : 1414). I guess 
>> this simply means that my spam is a lot less consistent than my ham.
>
>With 'classic' train to exhaustion, the database is kept exactly balanced, I
>believe.  How well is your system working for you?

Erm, not all that well. :|

My incoming mail is very unbalanced - 17:1 spam:ham since I started the
training - which can't help, but so far I have 18% unsure spam and 3% false
negatives. No mistakes on ham though; none scored higher than 0.5%. Given
that, I suppose I could simply mess with the thresholds.

-- Mat.


_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to