Ilmari Karonen <[EMAIL PROTECTED]> writes: > On 20 Aug 2001, Gisle Aas wrote: > > > > As it seems unlikely that any _working_ code could be using this > > function it might be ok to simply change the default. But this has > > Well, they could be encoding ISINDEX queries with it, I suppose. Not > escaping reserved chars is ok *if* the receiver isn't treating any of > them as metachars. But switching to behavior 2 below would not break > such usage. > > > been this way for such a long time (more than 6 years) so I still > > hesitate a bit. > > There is the point that there would then be both correct and broken > versions floating around. But I think that would still be better than > having only broken ones. > > > I see 2 ways of changing the default: > > > > 1) Remove % from the current set. (URI.pm already considers, > > % to be part of URIC, although this is a bit internal). > > 2) Go with your suggestion: [^A-Za-z0-9\-_.!~*'()] > > > > It looks like 1) is more likely to not break code, but perhaps 2) is a > > more useful default. > > Actually, I'd expect 2 to be safer, too. I decided that I want to go with 2. The attached patch is what I propose. If anybody thinks this is a bad idea please speak up. Regards, Gisle Index: URI/Escape.pm =================================================================== RCS file: /cvsroot/libwww-perl/uri/URI/Escape.pm,v retrieving revision 3.18 retrieving revision 3.19 diff -u -p -u -r3.18 -r3.19 --- URI/Escape.pm 2001/05/15 03:41:38 3.18 +++ URI/Escape.pm 2001/08/24 17:25:43 3.19 @@ -1,5 +1,5 @@ # -# $Id: Escape.pm,v 3.18 2001/05/15 03:41:38 gisle Exp $ +# $Id: Escape.pm,v 3.19 2001/08/24 17:25:43 gisle Exp $ # package URI::Escape; @@ -59,8 +59,11 @@ character class (between [ ]). E.g.: "^A-Za-z" # everything not a letter The default set of characters to be escaped is all those which are -I<not> part of the C<uric> character class shown above. +I<not> part of the C<uric> character class shown above as well as the +reserved characters. I.e. the default is: + "^A-Za-z0-9\-_.!~*'()" + =item uri_unescape($string,...) Returns a string with all %XX sequences replaced with the actual byte @@ -111,7 +114,7 @@ require Exporter; @ISA = qw(Exporter); @EXPORT = qw(uri_escape uri_unescape); @EXPORT_OK = qw(%escapes); -$VERSION = sprintf("%d.%02d", q$Revision: 3.18 $ =~ /(\d+)\.(\d+)/); +$VERSION = sprintf("%d.%02d", q$Revision: 3.19 $ =~ /(\d+)\.(\d+)/); use Carp (); @@ -136,8 +139,8 @@ sub uri_escape } &{$subst{$patn}}($text); } else { - # Default unsafe characters. (RFC 2732 ^uric) - $text =~ s/([^;\/?:@&=+\$,A-Za-z0-9\-_.!~*'()[\]])/$escapes{$1}/g; + # Default unsafe characters. RFC 2732 ^(uric - reserved) + $text =~ s/([^A-Za-z0-9\-_.!~*'()])/$escapes{$1}/g; } $text; }