At 09:57 AM 10/19/2004, Ronan wrote:
i see a limit to the regex descriptions which implement the matching on rules... you can search for 'cunt' but this provides a problem due to the scunthorpe affect.
I want to implement a filter that in a buffer of arbitrary length say 10 the pattern matches the 4 character string in the order in which the word is spelt. ie all below would be flagged


xxxcxxuxnt
cxxuxxnxtx

etc...
where x can be anything, space, underscore, whatever.

Two things.. One, use \b's at the beginning and end to force a "word boundary". This solves the scunthorpe affect problem.


Next, use .? to allow an "any character" insertion. The . is a wildcard, the ? makes it's presence optional ("occurs zero or one times")

/\b.?c.?u.?n.?t.?\b/

That said, I tend to prefer [_\W]? which matches an underscore or any "non-word" character. It won't match inserted letters, but will match inserted punctuation, etc.



you can then additionally scan for the typical spamming practises of v1agra or v.1.a.g.r.@

Have you looked at the DRUGS_* rules in SA 3.0, or antidrug.cf (for 2.6x)? These rules are HIGHLY efficient at catching that crud. Your message matched DRUGS_ERECTILE and DRUGS_ERECTILE_OBFU with great ease.


These rules even catch the accented-character obfuscations commonly used by spammers.

(Disclaimer: I wrote these rules so I am biased).

http://mywebpages.comcast.net/mkettler/sa/antidrug.cf

In particular, you want to look at _DRUGS_ERECTILE1 and the regex it uses to catch these.

To read it, break it apart character-by-character. [_\W]{0,3} is the standard "gap clause" used in this rule, so you'll see it between every character.

Most characters of the word itself have several options, but I always put the "real" letter first in the ranges. ie:
[ij1!|l\xEC\xED\xEE\xEF] is my substitution for the letter i.


I've also got some [xyz]? and x?'s in there due to a common spot for spammers to insert those letters into the middle of the word.



(Note: antidrug is built into SA 3.0, so don't add it if you're running 3.0)



Reply via email to