On Sat, 10 Nov 2012, Marc Perkel wrote:

Just a thought, I changed this:

uri  URI_PROTO_MC  /^(?!(?-i:https?:))https?:/i

into this:

uri  URI_PROTO_MC  /^(?!(?-i:ttps?:))ttps?:/i

Some people capitalize the H - but the rest of it being mixed case should be 100% accurate.

That breaks it. Note the RE is anchored at the beginning of the URI.

This is what you want:

  uri  URI_PROTO_MC  /^(?!(?-i:[Hh]ttps?:))https?:/i

The string inside the parentheses is what you want to _not_ hit, and that part is _not_ case-insensitive, even though the rest of the expression _is_ case-insensitive.

Also, for the TLD rule: after a bit of thought I realized it would be very unlikely a spammer would be doing this to a .gov URI, so I substituted .biz:

  uri  __URI_TLD_MC  
/\.(?!(?-i:com|net|org|biz|info))(?:com|net|org|biz|info)\b/i


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  The fetters imposed on liberty at home have ever been forged out
  of the weapons provided for defense against real, pretended, or
  imaginary dangers from abroad.               -- James Madison, 1799
-----------------------------------------------------------------------
 Tomorrow: Veterans Day

Reply via email to