http://bugzilla.spamassassin.org/show_bug.cgi?id=3318
------- Additional Comments From [EMAIL PROTECTED] 2004-04-27 16:46 ------- Subject: Re: New: multiply-encoded URIs missed On Tue, Apr 27, 2004 at 04:08:50PM -0700, [EMAIL PROTECTED] wrote: > > http://images.google.ca/imgres?imgurl=gmib.free.fr/viagra.jpg&imgrefurl=http://www.google.com/url?q=http://www.google.com/url?q=%68%74%74%70%3A%2F%2F%77%77%77%2E%65%78%70%61%67%65%2E%63%6F%6D%2F%6D%61%6E%67%65%72%33%32 > > we currently don't catch it, because of the second layer of encoding. 1) <grrr> I was going to say that the redirect doesn't work, but of course it works fine in IE. What a POS. 2) The reason we don't catch it is that we follow the spec... If ':' or '/' are encoded (%3A and %2F), it's supposed to stay encoded. So the code doesn't catch the encoded version. The URI works down to: http://images.google.ca/imgres?imgurl=gmib.free.fr/viagra.jpg&imgrefurl=http://www.google.com/url?q=http://www.google.com/url?q=http%3A%2F%2Fwww.expage.com%2Fmanger32 which also works in IE, BTW, then we grab the refurl: http://www.google.com/url?q=http%3A%2F%2Fwww.expage.com%2Fmanger32 then we stop due to the encoding, which (as above) is supposed to stay encoded. We can handle it to some degree by putting in a kluge: # If we see something that looks like a redirector, deal with it. if ($nuri =~ m#^(https?.+?https?)(\%3[aA]|:)((?:\%2[fF]|/){0,2})(.*)$#){ my($start, $col, $slash, $end) = ($1,$2,$3,$4); if ($col=~/\%/ || $slash=~/\%/) { push(@uris, "$start://$end"); } } Which makes the redirect stripper figure out that there's a redirection going on: debug: uri found: http://www.google.com/url?q=http%3A%2F%2Fwww.expage.com%2Fmanger32 debug: uri found: http://www.google.com/url?q=http://www.expage.com%2Fmanger32 debug: uri found: http://www.expage.com%2Fmanger32 However, that then makes other things screw up since the path %2F is encoded, and there could be a port encoding too (ie: 'www.kluge.net%3A8080'). All of which the spec says needs to stay encoded. I suppose we could build the code to deal with the encoding up to the first %2F, convert it to a /, then leave everything else. Have to churn on that for a little bit. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
