> >With 'classic' train to exhaustion, the database is kept exactly > >balanced, I believe. How well is your system working for you? > > Erm, not all that well. :|
:( I'm trying to get things rearranged a little for 1.1 so that it's easier to try out different training regimes (including tte) with the various apps, so hopefully that'll help. > My incoming mail is very unbalanced - 17:1 spam:ham since I > started the training - which can't help, but so far I have > 18% unsure spam and 3% false negatives. No mistakes on ham > though; none scored higher than 0.5%. Given that, I suppose I > could simply mess with the thresholds. I've read reports of people who have done that (in an extreme way, so that the cutoffs are 5% and 10% or something like that). It seems pretty risky to me, though, since a message that contains nothing that has been seen before will score 0.5 and that would be same under that system... =Tony.Meyer -- Please always include the list ([email protected]) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
