On 11/10/2012 8:57 AM, John Hardin wrote:
On Sat, 10 Nov 2012, Marc Perkel wrote:
Need a rule to catch this:
HtTp://goOGleplAcESSEOopTimiZaTIonx.cOm
Mixed case links
Mixed-case protocol:
uri URI_PROTO_MC /^(?!(?-i:https?:))https?:/i
Note: this _will_trigger on HTTP and HTTPS but I expect they are rare
in legitimate URIs
Mixed case TLD:
uri URI_TLD_MC
/\.(?!(?-i:com|net|org|gov|info))(?:com|net|org|gov|info)\b/i
Add TLDs as needed. Again, this _will_ trigger on totally UC TLDs. If
that's a problem just add the fully-uppercase TLD to the first TLD
list (the case-insensitive zero-width lookahead assertion).
Common domain name parts or subparts:
uri URI_GOOG_MC /(?!(?-i:google))google/i
HTH.
How much are you seeing these in real traffic?
I'm seeing a lot of these. They are coming from stolen Yahoo accounts
from back when Yahoo leaked their data base. They appear to come from
friends of mine.
Can you refine it so that there has to be something like at least 4
upper case characters in the URI to avoid false positives? For example.
http://WellsFargo.com ok
HttP://WeLlSfaRgo.cOm not OK
--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400