Theo Van Dinter wrote:

On Thu, Jul 22, 2004 at 12:09:14AM +0200, Jesse Houwing wrote:


This is the rule in question:

uri SARE_URI_EQUALS
m{^(?:(?:h|%[46]8)(?:t|%[57]4){2}(?:p|%[57]0)(?:s|%[57]3)?(?::|%3a)?(?:%5c|\\|%2f|/){0,2})[^/\?;]+=(?!(?:..)?$).*}i



Hrm. I have no idea what this is actually looking trying to
match. The first (?: bit isn't necessary, btw. Looks like an
URL with a = somewhere in the host section? ie: something like
'http://penistone=2eopoloveok=2ecom/3/' in a quoted-printable part?
(this is the only set of matches I could find with your RE)


No it looks for any uri with a = in the hostname (and excludes the quoted printable =) so:

http://www.iamahost=butthisismyrealname.com/ would match
http://www.butthisismyreal= would not,
neither would http://www.butthisismyreal=20

This is an internet explorer parsing bug I'm trying to detect here, and it is abused quite often in spam. Any chars before the = sign are discarted and the hostname after the is is used instead, but to the user the host before the = is shown (nifty).

If not, please post an example and I'll be happy to help debug.
(I don't think this is a 3.0 bug though.  See below.)

If so, however: yeah, that'll be different.  In 2.6:

http://penistone=2eopoloveok=2ecom/3/

vs 3.0:

http://penistone.opoloveok.com/3/

which is caused by 2.6 doing a very half-assed attempt at decoding the
quoted-printable part, so you get the QP bits in the URI.  3.0 does the
decoding properly (thanks total MIME parser rewrite!), so you end up
with the URI you're supposed to get, properly decoded.

Specifically, in PerMsgStatus::get_decoded_body_text_array(), which 2.6x
uses to get the uri list from, the un-quoted-printable code is:

   s/\=([0-9A-F]{2})/chr(hex($1))/ge;

which clearly has one flaw: it's looking for case-sensitive A-F!  D'oh!
Therefore, it doesn't match the URI above (uses lowercase).  3.0 does
the right thing here. :)

But it seesm to do it too harshly, I'll try to find an example from my corpus that should be tagged, but isn't in this case.

Jesse





Reply via email to