> i see a limit to the regex descriptions which implement the matching on
> rules... you can search for 'cunt' but this provides a problem due to
> the scunthorpe affect.
The trick here is to carefully delimit your regex so that it doesn't fail
this test. There are several meta-characters you can use in a re that can
help here:
\b word boundary
\W non-word character
\s whitespace character
So your case would be correctly done as /\bcunt\b/i. (The 'i' on the end
says to do a case-insennsitive match.)
> I want to implement a filter that in a buffer of arbitrary length say 10
> the pattern matches the 4 character string in the order in which the
> word is spelt. ie all below would be flagged
>
> has this been done
> As i am not a regex guru im still trying to implement this
> but i thought id throw it out to yis anyway
"Has this been done?" Well, it depends on what exactly "this" is. There
are a lot of rules that check for various word obfuscations such as you
suggest, plus a lot more things, like using international characters that
resemble english characters.
Now, on the other hand, there aren't a huge number of rules around that
specifically check for obfuscated porn. There are some, but it turns out a
lot of the porn tends to be plain text, and they obscure normal words like
"mother" with obfuscation tricks.
Another thing about checking for spam or obfuscation with rules - it often
is much better to check for phrases rather than single words. For instance,
checking for 'credit' would be a poor choice. Checking for 'bad credit'
would be a lot better choice.
If you are looking for add-on rule sets that can catch various things
(including porn), go off to the exit0 web site or www.rulesemporium.com.
Both have a large collection of good things you can add to SA.
Loren