When I run the tte script, it always stops after 4 or 5 rounds. If it goes beyond 6 rounds it's a sure sign that I've misclassified something (**). What I do then is run the script with -v and let it show me the messages it's training on. In later runs it's always training one or two messages that are the culprits. I just look for those message IDs in TBird and move them into the right training sets.
I do this training automatically on my server, so what I'd like to do is have the script automatically email me a notice identifying problem messages in my training set. Maybe it should even mark them deleted (I use IMAP) and restart the process. Thoughts? (**) Technically speaking, running for 6 or more rounds doesn't necessarily identify a misclassification. Sometimes it identifies a correctly-classified message for which there is an oppositely-classified near-twin. Today, it had some trouble with a "correctly" classified-as-ham Mailman moderation request message containing a piece of spam that I had also received directly and thus classified as spam. So everything was classified "correctly." What prompted me to look at this situation was that lots of ham started falling into my "unsure" folder today. So despite the fact that everything was classified correctly, overall performance was noticeably reduced. I'm tempted to conclude that TTE running for more than 5 rounds just indicates a classification that's bad for performance. I guess my next question is whether this near-twin classification (the difference being that one of the messages was a moderation request) is supposed to work well? I guess if I have some other moderation requests in my spam folder that could really confuse things... hmm, straightened that out and it still didn't finish training speedily. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev