Darryl> If the bad guys weren't changing tactics all the time stable Darryl> code would be fine. I am getting more and more spam that is not Darryl> being identified even on the third and fourth sample. I have not Darryl> done a rigorous investigation but I'm assuming they are fiddling Darryl> with stuff I can't see unless I look at the raw source.
What steps have you taken to keep your training database fresh? How many hams and spams are in your database? Are you sure there are no mistakes? In my own training I find that some messages, even though technically correctly classified, foul things up a bit. Consider someone posting a message to a mailing list which has nothing other than the url for a get rich quick website in the message body. If what you generally get from that list is good mail, most/all the header information in this one spam message will go coutner to that. If your training database is small, it may offset the various good header clues enough that the message is classified as unsure. In extreme cases you may get false positives. The primary type of spam I see SB fail to properly classify on a semi-regular basis is spam which embeds a its come-on in a GIF image. It may be that my training database is too small and just doesn't include many samples. My individual word scores are dominated by 0.16 and 0.84 which means I only have a single sample of that word. This is probably a side-effect of my train-to-exhaustion regimen. Skip _______________________________________________ SpamBayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html