> I have been leaving the category as SpamBayes set it for messages > it had correctly identified, so presumably have been "re-training" > it on ones it already got right. I thought this was the correct > way, confirming that SB was right in those instances, or does it > mean that a bias of any sort could develop?
It's not 100% clear what the best training regime is. Simulations so far, as well as anecdotal evidence, have shown that a 'mistake-based' training regime is probably best. (For example, only training on false positives, false negatives and unsures, alternatively, training only on 'nonedge' messages (e.g. scoring between 10% and 90%)). One reason these probably work better is that the databases end up smaller, which means that if 'random' real words are added to a spam, it is less likely that they are in your database (which means they are ignored). > It would be good if clear instructions similar to the above were > included in the interface page below the list of mails processed so > it's there for easy reference. If you click on the "Help" icon at the bottom of the page, it says pretty much what I did in the email, and has a link to the wiki where training options are discussed in more detail (since there isn't a definitive answer about what is best, it's hard to have a concise summary distributed with the software). If you can think of ways that the help text could be improved, please let us know (IIRC I simply wrote what I thought of at the time, and it hasn't been reviewed since). =Tony.Meyer -- Please always include the list (spambayes at python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
