Phil Endecott wrote: > There is not much to go on in terms of specifications. The closest is > RFC1738, which includes BNF for a file: URI. However it is ten years > old, so whether it reflects current practice I do not know. But it does > not allow ; in file: URIs. > > I conclude from this that wget should be replacing ; with its %3b escape > sequence.
I think you're confusing what wget is required to do with URLs entered on the command line and what it chooses to do with the resulting files that it saves. If a unencoded name of retrieved resource cannot be stored on the local file system, wget encodes it to create a valid name. > Tony Lewis wrote: > > > I use semicolons in CGI URIs to separate parameters. (Ampersand > > > is more often used for this, but semicolon is also allowed and > > > has the advantage that there is no need to escape it in HTML.) > > > > There is no need to escape ampersands either. > > Tony, are you suggesting that this is legal HTML? > > <a href="http://foo.foo/foo.cgi?p1=v1&p2=v2">Foo</a> > > I'm fairly confident that you need to escape the & to make it valid, i.e. > > <a href="http://foo.foo/foo.cgi?p1=v1&p2=v2">Foo</a> Just out of curiosity, did you try to implement your theory and see what happens? If you did, you would that the first version works and the second does not. By the way, the correct URI encoding of ampersand is "%26", not "&". The latter encoding is used for ampersands in HTML markup. With regard to whether ampersand needs to be encoded, you're misreading the RFC: Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme. Usually a URL has the same interpretation when an octet is represented by a character and when it encoded. However, this is not true for reserved characters: encoding a character reserved for a particular scheme may change the semantics of a URL. The RFC says that you have to escape Reserved characters if that character appears in the name of the resource you're trying to retrieve. That is, if you're trying to retrieve a file named "a&b.txt", you refer to that file as "a%26b.txt" in the URL because you're using the ampersand for a non-reserved purpose. If you're using a reserved character for the purpose that it has been reserved (in this case, separating parameters), you do NOT want to encode it. The URL you proposed (after correcting the encoding of the ampersand) is requesting a resource (probably a file) whose name is "foo.cgi?p1=v1&p2=v2". It is NOT requesting that the script foo.cgi be executed with argument p1 having a value of v1 and p2 having a value of v2. Hope that helps. Tony