> I am very happy with Spambayes' performance. I have it running in a > number of different environments including two Linux boxes and a > larger number of Windows machines. With the Linux boxes, I wonder > if there's a point of diminishing returns with training - I ask > because training is becoming quite unwieldy at this point -
If there is such a point, it will be cross-platform, since the classification is platform-agnostic. > when I > go to the training window, it loads around 4-500 messages and this > can become tedious - especially when there's a message I'm unsure > of; if I click it to see what's in it, the return to the > training page has put all the checkmarks back to where they were > when I first opened it, and I have to go through the whole list again. How are you going back to the review page? If you use the browser's "back" button, then the browser ought to display the page with all the checkmarks as they were (some browsers are better at doing this than others). Alternatively, you could open the 'view message' in a new window/tab, which would work around this. Note that you can set the default actions to take for messages, too (Advanced Configuration page), which might make this process faster. There's also an option to not cache messages with the 'bulk' header, which includes most well-behaved mailing lists, which typically have no or little spam - using that option might also help. What sort of training are you doing? Sb_server still defaults to training ham, and discarding spam, I think. It would be better to do mistake-based training, where you only train any false positives, false negatives and unsures (and adjust the thresholds if necessary, to reduce the number of (particularly spam) unsures). There's lots more about this at: <http://entrian.com/sbwiki/TrainingIdeas> > Is there a point at which it is better to delete the database and start > training anew? I know this is probably a hard question to answer, but, > I wonder if you have some thoughts on this subject. There probably is, but I don't know when it is. I personally start from scratch every few months or so, but that's almost always because I'm testing out an experimental database format and something goes wrong with it, forcing a retrain. AFAIK no-one has done any testing on this, although there has been tests on 'aging' a database (removing messages after a certain amount of time), which did OK, IIRC, but not significantly better than other training techniques. Supporting different types of training is one way that I think SpamBayes (specifically the Outlook plug-in and sb_server) could really improve. No time to work on that, yet, unfortunately. =Tony.Meyer -- Please always include the list (spambayes at python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
