Re: message id and filter suggestion

Loren Wilton 19 Oct 2004 14:19:31 -0000

> i see a limit to the regex descriptions which implement the matching on
> rules... you can search for 'cunt' but this provides a problem due to
> the scunthorpe affect.


The trick here is to carefully delimit your regex so that it doesn't fail
this test.  There are several meta-characters you can use in a re that can
help here:

    \b    word boundary
    \W    non-word character
    \s    whitespace character

So your case would be correctly done as /\bcunt\b/i.  (The 'i' on the end
says to do a case-insennsitive match.)

> I want to implement a filter that in a buffer of arbitrary length say 10
> the pattern matches the 4 character string in the order in which the
> word is spelt. ie all below would be flagged
>
> has this been done
> As i am not a regex guru im still trying to implement this
> but i thought id throw it out to yis anyway

"Has this been done?"  Well, it depends on what exactly "this" is.  There
are a lot of rules that check for various word obfuscations such as you
suggest, plus a lot more things, like using international characters that
resemble english characters.

Now, on the other hand, there aren't a huge number of rules around that
specifically check for obfuscated porn.  There are some, but it turns out a
lot of the porn tends to be plain text, and they obscure normal words like
"mother" with obfuscation tricks.

Another thing about checking for spam or obfuscation with rules - it often
is much better to check for phrases rather than single words.  For instance,
checking for 'credit' would be a poor choice.  Checking for 'bad credit'
would be a lot better choice.

If you are looking for add-on rule sets that can catch various things
(including porn), go off to the exit0 web site or www.rulesemporium.com.
Both have a large collection of good things you can add to SA.

        Loren

Re: message id and filter suggestion

Reply via email to