"Sean M. Burke" <[EMAIL PROTECTED]> writes: > 5) URL-encoding. There's no URLencoding here, but I've seen > plenty of it before. It's pretty much resolvable with URI.pm: > > use URI; > sub obf { local $_ = $_[0]; s/([^\/])/sprintf '%%%2x', ord $1/eg; $_ } > my $obf = 'http://' . obf('www.perl.com/pub/a/2001/08/27/bjornstad.html'); > > print "obf: $obf\n"; > my $x = URI->new($obf); > print "normalized: ", $x, "\n"; > print "canonical: ", $x->canonical, "\n"; > > Output (wrapped for readability): > > obf: http://%77%77%77%2e%70%65%72%6c%2e%63%6f%6d/%70%75%62/%61/%32 > %30%30%31/%30%38/%32%37/%62%6a%6f%72%6e%73%74%61%64%2e%68%74%6d%6c > normalized: http://%77%77%77%2e%70%65%72%6c%2e%63%6f%6d/%70%75%62/%61/%32 > %30%30%31/%30%38/%32%37/%62%6a%6f%72%6e%73%74%61%64%2e%68%74%6d%6c > canonical: http://www.perl.com/pub/a/2%30%301/%308/27/bjornstad.html > > That's with URI.pm 1.11. Hm, odd that "%32%30%30%31" canonizes as > "2%30%301", not "2001". Gisle? Just another case where I'm missing the mythical ?? operator :-) The difference between the two middle digits and the others is that they are false. This patch fixes the problem: Index: URI/http.pm =================================================================== RCS file: /cvsroot/libwww-perl/uri/URI/http.pm,v retrieving revision 1.3 diff -u -p -u -r1.3 http.pm --- URI/http.pm 1998/09/11 09:54:04 1.3 +++ URI/http.pm 2001/09/01 02:16:35 @@ -25,7 +25,8 @@ sub canonical $unreserved_escape{sprintf "%%%02X", ord($_)} = $_; } } - $$other =~ s/(%[0-9A-F]{2})/$unreserved_escape{$1} || $1/ge; + $$other =~ s/(%[0-9A-F]{2})/exists $unreserved_escape{$1} ? + $unreserved_escape{$1} : $1/ge; $other->path("/") if $slash_path; } $other; Regards, Gisle