Hi, On 12 Aug 2003 15:50:12 -0700 Chris Bradfield <[EMAIL PROTECTED]> wrote:
> So you want 50 points for any file with > > Content-Type: application/x-msexcel; > ^^^ > ??? > > Looking through a mailbox full of mime-encoded attachments (and only > attachments) I found several occurrences of "sex" in the encoded data. `egrep -ci sex /usr/dict/words` yields 19 words including the following non-gender, non-copulatory terms: Essex Middlesex Sextans sextet sextillion sexton sextuple sextuplet Sussex > That can't possibly be what you're after. There's a real danger in > getting overzealous going after "bad" words. Your HR department is > bound to get communications involving "sex"ual harassment, "sex" > discrimination, etc. Learn from Prodigy's mistakes. If you're going to wander down that slippery path of filtering 'naughty' content, read and understand "Mastering Regular Expressions" (by Jeffery Friedl from O'Reilly), read `perldoc perlre`, and score your homebrew rules at 0.01 until you're comfortable they rarely flag false positives, if ever. Better to test these out on your own account for a while before inflicting them on your users. Better still to train Bayes to flag these. Improved accuracy and less work for you. > I think you really need to take a breath and think rationally about > these rules before you implement them. You're taking an extremely smart > content filtering system and turning it into a really dumb one. Scunthorpe. -- Bob ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk