> My ISP provides a spam filtering service (server side) that labels the
> things that they think are spam by putting an extra string in the
> subject like (e.g. "--Spam--" at the front).  Their filters don't
> catch everything so I want to also use SpamBayes to eliminate the spam
> that my ISP doesn't label.  My question is whether or not I should
> train SpamBayes with the spams that get labeled by my ISP.  I could
> easily see SpamBayes picking up on the "--Spam--" string in the
> subject line and filtering just based on that.

It's possible (even likely, assuming that their filter is any good) that
"subject:--Spam--" will become a strong spam clue, but hopefully there would
be enough ham clues in the (ISP's) false positive that SpamBayes would still
be able to make a correct (or perhaps unsure) classification.

The only way to know for sure is to give it a go.  You can see how strong
the "subject:--Spam--" clue is by looking at the clues for a message with
such a modified subject.

The situation with my work email is similar - I can opt out of their spam
filtering, but that means that they will prepend "[SPAM]" to the subject.  I
ignore their classification and SpamBayes still works fine.  However, they
have some really terrible false positives, which means that "subject:[SPAM]"
isn't as strong a clue as it would be otherwise.

> On the other hand maybe that would introduce some selection bias or a
> bad spam vs ham ratio for training (e.g. maybe I'll get 50 ham, 40 spam
> caught by my ISP, and 10 spam not caught by my ISP (I don't know what
> the ratio is yet, I only just started using my ISP's filter)).

It's all guesswork at the moment, but you might find that it helps with
keeping a ~1::1 ratio.  Ham tends to be reasonably homogenous, so you
generally need to train on less of it than spam (assuming you're doing some
sort of train-on-errors-and-unsures training), so this might help balance
that out.

> Does anyone have any advice on whether these might interfere or how to
> avoid that interference?  Should I even be using my ISP's filter along
> with SpamBayes or just SpamBayes by itself?

If the ISP's filter is reasonably good, then you might as well as it as
well; plenty of people like these sort of tiered filter systems.

I expect that you'll find that it doesn't interfere at all; the only way to
know for sure is to try it out, though.  (Maybe after training for a while,
you could get someone to send you a ham message with "--Spam--" in the
subject, and see if the hammy clues are enough to get it through).  Let us
know if you find out anything interesting!  :)

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. 

_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to