Phil Endecott wrote:

> There is not much to go on in terms of specifications.  The closest is
> RFC1738, which includes BNF for a file: URI.  However it is ten years
> old, so whether it reflects current practice I do not know.  But it does
> not allow ; in file: URIs.
>
> I conclude from this that wget should be replacing ; with its %3b escape
> sequence.

I think you're confusing what wget is required to do with URLs entered on
the command line and what it chooses to do with the resulting files that it
saves. If a unencoded name of retrieved resource cannot be stored on the
local file system, wget encodes it to create a valid name.

> Tony Lewis wrote:
> > > I use semicolons in CGI URIs to separate parameters.  (Ampersand
> > > is more often used for this, but semicolon is also allowed and
> > > has the advantage that there is no need to escape it in HTML.)
> >
> > There is no need to escape ampersands either.
>
> Tony, are you suggesting that this is legal HTML?
>
>   <a href="http://foo.foo/foo.cgi?p1=v1&p2=v2";>Foo</a>
>
> I'm fairly confident that you need to escape the & to make it valid, i.e.
>
>   <a href="http://foo.foo/foo.cgi?p1=v1&amp;p2=v2";>Foo</a>

Just out of curiosity, did you try to implement your theory and see what
happens? If you did, you would that the first version works and the second
does not.

By the way, the correct URI encoding of ampersand is "%26", not "&amp;". The
latter encoding is used for ampersands in HTML markup.

With regard to whether ampersand needs to be encoded, you're misreading the
RFC:

   Many URL schemes reserve certain characters for a special meaning:
   their appearance in the scheme-specific part of the URL has a
   designated semantics. If the character corresponding to an octet is
   reserved in a scheme, the octet must be encoded.  The characters ";",
   "/", "?", ":", "@", "=" and "&" are the characters which may be
   reserved for special meaning within a scheme. No other characters may
   be reserved within a scheme.

   Usually a URL has the same interpretation when an octet is
   represented by a character and when it encoded. However, this is not
   true for reserved characters: encoding a character reserved for a
   particular scheme may change the semantics of a URL.

The RFC says that you have to escape Reserved characters if that character
appears in the name of the resource you're trying to retrieve. That is, if
you're trying to retrieve a file named "a&b.txt", you refer to that file as
"a%26b.txt" in the URL because you're using the ampersand for a non-reserved
purpose.

If you're using a reserved character for the purpose that it has been
reserved (in this case, separating parameters), you do NOT want to encode
it. The URL you proposed (after correcting the encoding of the ampersand) is
requesting a resource (probably a file) whose name is "foo.cgi?p1=v1&p2=v2".
It is NOT requesting that the script foo.cgi be executed with argument p1
having a value of v1 and p2 having a value of v2.

Hope that helps.

Tony

Reply via email to