Dave> 68+8 = 76
Dave> 222+56+4 = 282
Dave> So, somehow, the number of hams or spams "in the database" really
Dave> has to do with the number that are found to be misclassified and
Dave> thus influence the training data?
Dave> It's hard to understand the importance of keeping ham and spam
Dave> balanced if one or the other can ultimately influence training so
Dave> much more than the other.
I don't know. It just works. ;-) More seriously, a 4:1 ratio isn't that
far out of whack. If it was 10:1 or 50:1 I'd worry.
Skip
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html