For spam control, the best is Akismet, but Spam Assassin is also an
industry standard, I believe.

Other English word filtering resources:
http://urbanoalvarez.es/blog/2008/04/04/bad-words-list/
http://www.bannedwordlist.com/

And from http://www.noswearing.com/about.php:
"The API is currently open on a limited person basis. It currently
supports 4 operations: List dirty words, Censor dirty words, Replace
dirty words, and Define Dirty words. If you're interested in testing
it for us, please send an email to ryan at noslang.com."

Michael McGinnis

On Feb 16, 9:18 am, Jonathan Lundell <[email protected]> wrote:
> On Feb 16, 2011, at 5:10 AM, Massimo Di Pierro wrote:
>
>
>
> > I am not an expert on bad words in english, I could do a better job in
> > Italian. ;-)
>
> No doubt...
>
> It's a tough problem, regardless, and it's awfully easy to get false 
> positives. Better not misspell shiitake mushrooms.
>
>
>
>
>
>
>
>
>
> > On Feb 15, 11:20 pm, Jonathan Lundell <[email protected]> wrote:
> >> On Feb 15, 2011, at 11:21 AM, Massimo Di Pierro wrote:
>
> >>> import base64, re
> >>> BADWORDS=re.compile(base64.b64decode("""KGFob2xlfGFudXN8YXNoMGxlfGFzaDBsZXN
> >>>  8YXNob2xlc3xhc3N8YXNzfGFzc2ZhY2V8YXNzaDBs
>
> >> There are some very peculiar choices (and omissions) in that list.
>
> >> rautenberg? job?
>
> >> honkey, but not honky (or its orginal, bohunk).
>
> >> Aside from the word choice, if you're going to use it, change match to 
> >> search, bracket the pattern with \b's, and make it case-insensitive.

Reply via email to