John Arbash Meinel writes: > Stephen J. Turnbull wrote: > > David Abrahams writes: > > > > > > This is a bug report. bugs.python.org seems to be down. > > > > > > >>> from urlparse import * > > > >>> urlunsplit(urlsplit('git+file:///foo/bar/baz')) > > > git+file:/foo/bar/baz > > > > > > Note the dropped slashes after the colon. > > > > That's clearly wrong, but what does "+" have to to do with it? AFAIK, > > the only thing special about + in scheme names is that it's not > > allowed as the first character. > > Don't you need to register the "git+file:///" url for urlparse to > properly split it? > > if protocol not in urlparse.uses_netloc: > urlparse.uses_netloc.append(protocol)
I don't know about the urlparse implementation, but from the point of view of the RFC I think not. Either BCP 35 or RFC 3986 (or maybe both) makes it plain that if the scheme name is followed by "://", the scheme is a hierarchical one. So that URL should parse with an empty authority, and be recomposed the same. I would do this by parsing 'git+file:///foo/bar/baz' to ('git+file', '', '/foo/bar/baz') or something like than, and 'git+file:/foo/bar/baz' to ('git+file', None, '/foo/bar/baz'). I don't see any reason why implementations should abbreviate the empty authority by removing the double slashes, unless specified in the scheme definition. Although my reading of RFC 3986 is that a missing authority (no "//") *should* be dereferenced in the same way as an empty one: If the URI scheme defines a default for host, then that default applies when the host subcomponent is undefined or when the registered name is empty (zero length). (Sec. 3.2.2) I don't see why urlparse should try to enforce that by converting from one to the other. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com