> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Michael D. Adams > Sent: Thursday, July 07, 2005 10:26 PM > To: [email protected] > Subject: [Spambayes] To label or not to label, a practical question > > My ISP provides a spam filtering service (server side) that > labels the things that they think are spam by putting an > extra string in the subject like (e.g. "--Spam--" at the > front). Their filters don't catch everything so I want to > also use SpamBayes to eliminate the spam that my ISP doesn't > label. My question is whether or not I should train > SpamBayes with the spams that get labeled by my ISP. I could > easily see SpamBayes picking up on the "--Spam--" string in > the subject line and filtering just based on that.
Tony (who is much more knowledgeable than I on this product) has already answered so merely consider the following: If you rigorously train false positives (from your ISP) then these will show that SOME Ham does have this tag and thus it will NOT be sure Spam sign. If the ISP is "always right" then it will be (relatively) reliable spam sign and that is probably what you want. Just keep training on all mistakes -- that is probably the single most important trick to using Bayesian spam classifiers. You must NOT get lazy and just delete or ignore mistakes. > On the > other hand maybe that would introduce some selection bias or > a bad spam vs ham ratio for training (e.g. maybe I'll get 50 > ham, 40 spam caught by my ISP, and 10 spam not caught by my > ISP (I don't know what the ratio is yet, I only just started > using my ISP's filter)). > > Does anyone have any advice on whether these might interfere > or how to avoid that interference? Should I even be using my > ISP's filter along with SpamBayes or just SpamBayes by itself? My bet is they will not. My SpamAssassin ****SPAM***** gets by SpamBayes WHEN it is obviously not spam (I only let through the mistakes made by SpamAssassin so most of those tagged which reach Outlook are NOT spam) and it still grabs it if it is Spam (most of the time.) One thing about Spam filters in my experience, as often as they make mistakes, they catch things correctly that *I*, a human, would actually misclassify on first naive glance. (E.g., a mailing from a list that is NOT spam, but where someone has injected spam into the list -- a message from a technical "spammer" that I actually wish to see.) _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
