> I was doing a kind of manual "train to exhaustion", and the > other thing I noticed was that the spam took a lot more > training to make classification accurate (currently 82 ham : > 409 spam, out of a total training set of 644 : 1414). I guess > this simply means that my spam is a lot less consistent than my ham.
With 'classic' train to exhaustion, the database is kept exactly balanced, I believe. How well is your system working for you? > BTW, I also found a trick in Outlook to be able to train on a > given spam more than once, to force correct classification. > Normally this doesn't work because the plugin sees the two > messages as identical, but creating the copy in an IMAP > folder seems to fool it. Creating a copy in any store should work, I think. IIRC Tim pointed this out many many moons ago, although that was before Gary's blog about tte. =Tony.Meyer -- Please always include the list ([email protected]) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
