Re: office rule

NFN Smith Thu, 03 Apr 2008 09:10:00 -0700

mouss wrote:

...
The approach is flawed. a single word shouldn't be enough to tag mail asspam.

Furthermore, even checking for word boundaries may not help a lot on theOEM spammers. Several of them do quite a bit of obfuscation work totry to bypass simple filtering that the OP is asking for. One of theones that I'm seeing right now is "Office2OO8" (no boundaries), andobfuscation by replacing numeric zero with alpha "o".

Remember that the approach of SpamAssassin is not to do single rulesthat will force a big-bang reject, if a particular rule is hit (althoughI have a few of those, because my rules are well-crafted, and I knowmy servers' traffic flows).

The general approach is with cumulative scores -- the higher the score,the more likely the message is to be spam. However, a relatively highscore doesn't necessarily mean that the message is spam, and a low scoredoesn't mean that the message isn't spam. Spam-fighting at this levelis as much art as science, and the spammers are a moving target that goto great lengths to make their stuff indistinguishable from legitimate mail.

Thus, in evaluating how aggressive you are in fighting spam, a lot comesdown to your (and your users') tolerance for problems with accidentalrejection of something that somebody wants. This is part of theapproach of SpamAssassin, in that the score returned is a best-guessopinion, based on what rules were hit, and the cumulative score.

Depending on how you have SA implemented, it's good to have a "middlerange", where messages that are likely to be spam, and delivered to theuser (with the opinion of the probability of spam, reported in the SAscore), but where handling/disposal is left to user-level decisions(either manually or by client-level filtering).

There is, of course a point where something is so likely to be spam(e.g., a SA score above a certain threshold), that it is worth rejecting.

Thus, that isn't to say that you can't use custom-built SA rules toforce rejection, but in so doing, you have to do your rules carefully,and know your traffic flows -- both the spam and the ham. Plus, test,test, test.

When I'm evaluating the possibility of a new rule, one of the things Itypically do is implement a rule, and then assign a token score (say, .1points), and then watch mail flows for a while to see how the rule isbehaving. Only after I'm confident that rule is hitting what I want(and nothing else) do I increase the score.

The other tool that helps a lot when you want to be aggressive aboutcertain kinds of spam is making use of meta rules. Use two or three(ore even more rules, especially with boolean logic) that score lowly,even with token scores, as noted above, and use big scores only in themeta rule, which generates hits only when several other rules are hit.

Thus, for the OP, there's nothing necessarily wrong with looking for theword "office" in a subject line, but I wouldn't score anything more than0.1 points. As others in this thread have noted, it's a common wordthat's likely to show up frequently in non-spam. Thus, you needadditional rules that you can use with that one in a meta rule, whichessentially says, "If 'office' in the subject line AND <rule x> AND<rule y> then assign big score". However, if 'office' is there, but<rule x> doesn't get a hit, then you don't have enough confidence thatthe message is probably spam, and you don't assign the big score. Inthat case, if it turns out that the message really is spam, you have togo back to pattern analysis to find another pattern that matches.


Smith

Re: office rule

Reply via email to