Dave> 68+8 = 76
    Dave> 222+56+4 = 282

    Dave> So, somehow, the number of hams or spams "in the database" really
    Dave> has to do with the number that are found to be misclassified and
    Dave> thus influence the training data?

    Dave> It's hard to understand the importance of keeping ham and spam
    Dave> balanced if one or the other can ultimately influence training so
    Dave> much more than the other.

I don't know.  It just works. ;-)  More seriously, a 4:1 ratio isn't that
far out of whack.  If it was 10:1 or 50:1 I'd worry.

Skip
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to