Re: xxxl spam

John Rudd Thu, 13 Apr 2006 10:18:58 -0700


On Apr 13, 2006, at 9:56 AM, mouss wrote:

I am also seing many legit mail trigering some SA rules (*_exess,no_real_name, x_library, ...). when I see this, I check the rule, andif I can't find a justification, I disable it.


I wouldn't do that.

Just because legitimate mail triggers some rule doesn't mean that therule is flawed. Using your example, triggering "no_real_name" does notmean that the message is spam, it means that the message has _some_similarity to at least some spam messages (the higher the score, thestronger the similarity). And, that's absolutely true: statistically,when looking at the corpus which was used to create the rules database,a higher percentage of "no_real_name" messages were spam.

Now, if legit messages were not just triggering those rules, but alsotriggering enough rules to be flagged as spam ... then I would lowerthe value of those rules, but not disable those rules. But I wouldonly do that if I could see that there was a large percentage ofshould-be-ham messages being flagged as spam by that rule AND that rulewasn't being useful in flagging spam messages. The reason is: if themessage is being flagged, but it shouldn't have been, then perhaps my"corpus" of messages differs significantly enough from the SA internalcorpus that my score values need to be different. But that doesn'tmean that the rules are so disjoint from tracking spam that they shouldbe entirely disabled. They just don't have the same weighting that mycorpus needs.

If, instead, most messages passing through my mail servers, thattriggered that rule, really did seem to be spam, then I wouldn't alterthe score at all. I would just pass the should-have-been-ham messageinto my bayesian learner and hope that a low bayes score for messageslike that would offset the rules had flagged it as spam.

Re: xxxl spam

Reply via email to