Hi, >> This is what you want: >> >> uri URI_PROTO_MC /^(?!(?-i:[Hh]ttps?:))https?:/i >> >> The string inside the parentheses is what you want to _not_ hit, and that >> part is _not_ case-insensitive, even though the rest of the expression _is_ >> case-insensitive. >> >> Also, for the TLD rule: after a bit of thought I realized it would be very >> unlikely a spammer would be doing this to a .gov URI, so I substituted .biz: >> >> uri __URI_TLD_MC >> /\.(?!(?-i:com|net|org|biz|info))(?:com|net|org|biz|info)\b/i ... > > So far working good. Caught 4620 spams since sunday morning with these mixed > case rules. I added this as a separate rule. > > /^(?!(?-i:[Hh]ttps?:\/\/www))https?:\/\/www/i > > Found some cases where the HTTP was lower case but the WWW was mixed.
Can you really make scoring decisions based on a mixed-case URI? Do you have it as part of a meta with the other rules that John provided? I'm looking at John's sandbox entries, and wondering if there is a rule to be made from those URIs he's created, or are you just probing to see if they are tagged at this point? Thanks, Alex