On 11/10/2012 8:57 AM, John Hardin wrote:
On Sat, 10 Nov 2012, Marc Perkel wrote:

Need a rule to catch this:

HtTp://goOGleplAcESSEOopTimiZaTIonx.cOm

Mixed case links

Mixed-case protocol:

   uri  URI_PROTO_MC  /^(?!(?-i:https?:))https?:/i

Note: this _will_trigger on HTTP and HTTPS but I expect they are rare in legitimate URIs

Mixed case TLD:

uri URI_TLD_MC /\.(?!(?-i:com|net|org|gov|info))(?:com|net|org|gov|info)\b/i

Add TLDs as needed. Again, this _will_ trigger on totally UC TLDs. If that's a problem just add the fully-uppercase TLD to the first TLD list (the case-insensitive zero-width lookahead assertion).

Common domain name parts or subparts:

   uri  URI_GOOG_MC   /(?!(?-i:google))google/i

HTH.

How much are you seeing these in real traffic?



I'm seeing a lot of these. They are coming from stolen Yahoo accounts from back when Yahoo leaked their data base. They appear to come from friends of mine.

Can you refine it so that there has to be something like at least 4 upper case characters in the URI to avoid false positives? For example.

http://WellsFargo.com ok
HttP://WeLlSfaRgo.cOm  not OK


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Reply via email to