1. A recent training run went like this: round: 1, msgs: 690, ham misses: 61, spam misses: 210, 176.3s round: 2, msgs: 690, ham misses: 8, spam misses: 53, 165.6s round: 3, msgs: 690, ham misses: 1, spam misses: 7, 159.6s round: 4, msgs: 690, ham misses: 1, spam misses: 2, 159.6s round: 5, msgs: 690, ham misses: 0, spam misses: 1, 157.8s round: 6, msgs: 690, ham misses: 1, spam misses: 1, 160.9s round: 7, msgs: 690, ham misses: 0, spam misses: 1, 211.0s round: 8, msgs: 690, ham misses: 0, spam misses: 1, 172.6s round: 9, msgs: 690, ham misses: 0, spam misses: 1, 197.1s round: 10, msgs: 690, ham misses: 1, spam misses: 1, 174.6s
It seems that the results got *worse* in rounds 6 and 10. Am I misinterpreting this? Are these expected results? 2. I have about 350 each of ham and spam that I can use to train on. I'm sure that some of these messages are mostly redundant and add little or nothing of value to the training data. I don't want to waste time on them every time I do a training run. Is there some way to use tte.py to reduce my training set to the messages that actually make a difference? Thanks in advance! -- Dave Abrahams Boost Consulting http://www.boost-consulting.com Don't Miss BoostCon 2007! ==> http://www.boostcon.com _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
