Hi Jon,
I think we've seen this discussion on the list before
(so Christopher, check the archives!)

> > I'm wondering if someone has a great source for a master-list
> > of controversial and vulger words that I can use on my site.
> > I would like to pattern match input text against this master-list
> > in order to prevent vulger and controversial words from appearing
> > on my site.
> Once you've got the routine working, post it here, because there are many
> people who would like to know how to do this properly.

> The problems that others have experienced in the past are:
>   - what happens with "mis"spellings, e.g. "fsck"?
>   - what happens with dodgy formatting, e.g "f s c k"?
>   - what happens with words like "Scunthorpe"?

Problem 1: add likely/popular mis-spellings to the list of vulger/vulgar

Problem 2: (contrived) very few single-letter words exist so remove
intervening white space prior to analysis

Problem 2a: (the more popular f*ck - someone suffering the misapprehension
that (s)he is somehow NOT guilty of using bad language/being offensive when
(s)he plainly is not only doing so but attempting to be deceptive as
well...) see response to Problem 1 (the probably habit would be to
replace/remove vowels)

Problem 3: Scunthorpe contains an unfortunate series of letters (amongst the
town's many disadvantages) however the critical four are not a word in and
of their own right so employ whitespace (\s) in the RegEx or token analysis.

> May I suggest, rather than picking your way through this minefield, you
> provide a "report abusive comment" link instead?

Most sensible! The employment of a technological solution to a social
problem is somewhat shooting the messenger. However some countries are now
legislating responsibility that ISPs/employers must discharge (shooting the
person who shoes the horses that the Pony Express messenger is riding!?)

In this case perhaps one could analyse the incoming text and place an
embargo on its publication on the web site until it has been reviewed by a
human editor?

If we were talking about filtering incoming email, then perhaps the original
message could be forwarded/wrapped with a message from the EmailAdmin/System
pointing out that a message has arrived from xyz (etc) and has been flagged
for a stated reason (but that there is room for interpretation within the
mechanical observation) and that the message should not be opened by anyone
fearing offence. (this similar to 'security' gateways that don't allow msgs
with attachments unless the 'employee' first authorises a 'pass-through')

Euro 0.02's worth?

PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to