Re: Changing URI::Escape default behavior

Gisle Aas Fri, 24 Aug 2001 10:05:30 -0700
Ilmari Karonen <[EMAIL PROTECTED]> writes:

> On 20 Aug 2001, Gisle Aas wrote:
> > 
> > As it seems unlikely that any _working_ code could be using this
> > function it might be ok to simply change the default.  But this has
> 
> Well, they could be encoding ISINDEX queries with it, I suppose.  Not
> escaping reserved chars is ok *if* the receiver isn't treating any of
> them as metachars.  But switching to behavior 2 below would not break
> such usage.
> 
> > been this way for such a long time (more than 6 years) so I still
> > hesitate a bit.
> 
> There is the point that there would then be both correct and broken
> versions floating around.  But I think that would still be better than
> having only broken ones.
> 
> > I see 2 ways of changing the default:
> > 
> >    1) Remove % from the current set.  (URI.pm already considers,
> >       % to be part of URIC, although this is a bit internal).
> >    2) Go with your suggestion: [^A-Za-z0-9\-_.!~*'()]
> > 
> > It looks like 1) is more likely to not break code, but perhaps 2) is a
> > more useful default.
> 
> Actually, I'd expect 2 to be safer, too.

I decided that I want to go with 2.  The attached patch is what I
propose.  If anybody thinks this is a bad idea please speak up.

Regards,
Gisle


Index: URI/Escape.pm
===================================================================
RCS file: /cvsroot/libwww-perl/uri/URI/Escape.pm,v
retrieving revision 3.18
retrieving revision 3.19
diff -u -p -u -r3.18 -r3.19
--- URI/Escape.pm       2001/05/15 03:41:38     3.18
+++ URI/Escape.pm       2001/08/24 17:25:43     3.19
@@ -1,5 +1,5 @@
 #
-# $Id: Escape.pm,v 3.18 2001/05/15 03:41:38 gisle Exp $
+# $Id: Escape.pm,v 3.19 2001/08/24 17:25:43 gisle Exp $
 #
 
 package URI::Escape;
@@ -59,8 +59,11 @@ character class (between [ ]).  E.g.:
   "^A-Za-z"                     # everything not a letter
 
 The default set of characters to be escaped is all those which are
-I<not> part of the C<uric> character class shown above.
+I<not> part of the C<uric> character class shown above as well as the
+reserved characters.  I.e. the default is:
 
+  "^A-Za-z0-9\-_.!~*'()"
+
 =item uri_unescape($string,...)
 
 Returns a string with all %XX sequences replaced with the actual byte
@@ -111,7 +114,7 @@ require Exporter;
 @ISA = qw(Exporter);
 @EXPORT = qw(uri_escape uri_unescape);
 @EXPORT_OK = qw(%escapes);
-$VERSION = sprintf("%d.%02d", q$Revision: 3.18 $ =~ /(\d+)\.(\d+)/);
+$VERSION = sprintf("%d.%02d", q$Revision: 3.19 $ =~ /(\d+)\.(\d+)/);
 
 use Carp ();
 
@@ -136,8 +139,8 @@ sub uri_escape
        }
        &{$subst{$patn}}($text);
     } else {
-       # Default unsafe characters. (RFC 2732 ^uric)
-       $text =~ s/([^;\/?:@&=+\$,A-Za-z0-9\-_.!~*'()[\]])/$escapes{$1}/g;
+       # Default unsafe characters.  RFC 2732 ^(uric - reserved)
+       $text =~ s/([^A-Za-z0-9\-_.!~*'()])/$escapes{$1}/g;
     }
     $text;
 }
Re: Changing URI::Escape default behavior

Reply via email to