Darryl> If the bad guys weren't changing tactics all the time stable
    Darryl> code would be fine. I am getting more and more spam that is not
    Darryl> being identified even on the third and fourth sample. I have not
    Darryl> done a rigorous investigation but I'm assuming they are fiddling
    Darryl> with stuff I can't see unless I look at the raw source.

What steps have you taken to keep your training database fresh?  How many
hams and spams are in your database?  Are you sure there are no mistakes?
In my own training I find that some messages, even though technically
correctly classified, foul things up a bit.  Consider someone posting a
message to a mailing list which has nothing other than the url for a get
rich quick website in the message body.  If what you generally get from that
list is good mail, most/all the header information in this one spam message
will go coutner to that.  If your training database is small, it may offset
the various good header clues enough that the message is classified as
unsure.  In extreme cases you may get false positives.

The primary type of spam I see SB fail to properly classify on a
semi-regular basis is spam which embeds a its come-on in a GIF image.  It
may be that my training database is too small and just doesn't include many
samples.  My individual word scores are dominated by 0.16 and 0.84 which
means I only have a single sample of that word.  This is probably a
side-effect of my train-to-exhaustion regimen.

Skip
_______________________________________________
SpamBayes@python.org
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to