Senthil Kumaran writes: > Not all urls have the 'authority' component after the scheme. (sip > based urls for e.g) urlparse differentiates those by maintaining a > list of scheme names which will follow the pattern of parsing, and > joining for the urls which have a netloc (or authority component). > This is in general according to RFC 3986 itself.
This actually quite at variance with the RFC. The grammar in section 3 doesn't make any reference to schemes as being significant in parsing. Whether an authority component is to be parsed or not is entirely dependent on the presence or absence of the "//" delimiter following the scheme and its colon delimiter. AFAICS, if the "//" delimiter is present, an authority component (possibly empty) *must* be present in the parse. Presumably an unparse should then include that empty component in the generated URI (ie, a "scheme:///..." URI). Thus, it seems that by the RFC, regardless of any registration, urlparse.unsplit(urlparse.split('git+file:///foo/bar')) should produce 'git+file:///foo/bar' (or perhaps raise an error in "validation" mode). The only question is whether registration of 'git+file' as a use_netloc scheme should force urlparse.unsplit(urlparse.split('git+file:/foo/bar')) to return 'git+file:///foo/bar', or whether 'git+file:/foo/bar' would be acceptable (or better). None of what I wrote here or elsewhere takes account of backward compatibility, it is true. I'm only talking about the letter of the RFC. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com