Hi,

On 12 Aug 2003 15:50:12 -0700 Chris Bradfield <[EMAIL PROTECTED]> wrote:

> So you want 50 points for any file with 
> 
> Content-Type: application/x-msexcel;
>                              ^^^
> ???
> 
> Looking through a mailbox full of mime-encoded attachments (and only
> attachments) I found several occurrences of "sex" in the encoded data.

`egrep -ci sex /usr/dict/words` yields 19 words including the following
non-gender, non-copulatory terms:

Essex
Middlesex
Sextans
sextet
sextillion
sexton
sextuple
sextuplet
Sussex
 
> That can't possibly be what you're after.  There's a real danger in
> getting overzealous going after "bad" words.  Your HR department is
> bound to get communications involving "sex"ual harassment, "sex"
> discrimination, etc.

Learn from Prodigy's mistakes. If you're going to wander down that
slippery path of filtering 'naughty' content, read and understand
"Mastering Regular Expressions" (by Jeffery Friedl from O'Reilly), read
`perldoc perlre`, and score your homebrew rules at 0.01 until you're
comfortable they rarely flag false positives, if ever. Better to test
these out on your own account for a while before inflicting them on your
users. Better still to train Bayes to flag these. Improved accuracy and
less work for you.

> I think you really need to take a breath and think rationally about
> these rules before you implement them.  You're taking an extremely smart
> content filtering system and turning it into a really dumb one.

Scunthorpe.

-- Bob


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to