My ISP provides a spam filtering service (server side) that labels the things that they think are spam by putting an extra string in the subject like (e.g. "--Spam--" at the front). Their filters don't catch everything so I want to also use SpamBayes to eliminate the spam that my ISP doesn't label. My question is whether or not I should train SpamBayes with the spams that get labeled by my ISP. I could easily see SpamBayes picking up on the "--Spam--" string in the subject line and filtering just based on that. On the other hand maybe that would introduce some selection bias or a bad spam vs ham ratio for training (e.g. maybe I'll get 50 ham, 40 spam caught by my ISP, and 10 spam not caught by my ISP (I don't know what the ratio is yet, I only just started using my ISP's filter)).
Does anyone have any advice on whether these might interfere or how to avoid that interference? Should I even be using my ISP's filter along with SpamBayes or just SpamBayes by itself? Michael D. Adams [EMAIL PROTECTED] _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
