Martin Panter added the comment:
It is true that 3.5 is meant to follow RFC 3986, which obsoletes RFC 1808 and
specifies slightly different behaviour for abnormal cases. This change is
documented under urljoin(), and also in “What’s New in 3.5”. Pavel’s first case
is one of these differences in the RFCs, and I don’t think it is a bug.
According to <https://tools.ietf.org/html/rfc3986.html#section-5.2.4>,
“The remove_dot_segments algorithm respects [the base’s] hierarchy by removing
extra dot-segments rather than treating them as an error or leaving them to be
misinterpreted by dereference implementations.”
For Pavel’s second and third cases, RFC 3986 doesn’t cover them directly
because the base URL is relative. The RFC only covers absolute base URLs, which
start with a scheme like “http:”. The documentation doesn’t really bless these
cases either: ‘Construct a full (“absolute”) URL’. However there is explicit
support in the source code ("" in urllib.parse.uses_relative).
It looks like 3.5 is strict in following the RFC’s Remove Dot Segments
algorithm. Step 2C says that for “/../” or “/..”, the parent segment is
removed, but the input is always replaced with “/”:
“a/..” → “/”
“a/../..” → “/..” → “/”
I would prefer a less strict interpretation of the spirit of the algorithm. Do
not introduce a slash in the input if you did not remove one from the output
buffer:
“a/..” → empty URL
“a/../..” → “..” → empty URL
Python 3.4 and earlier did not behave sensibly if you extend the relative URL:
>>> urljoin("a/", "..")
''
>>> urljoin("a/", "../..")
'..'
>>> urljoin("a/", "../../..")
''
>>> urljoin("a/", "../../../..")
'../'
Pavel, what behaviour would you expect in these cases? My empty URL
interpretation, or perhaps a more sensible version of the Python 3.4 behaviour?
What is your use case?
One related more serious (IMO) regression I noticed compared to 3.4, where the
path becomes a host name:
>>> urljoin("file:///base", "/dummy/..//host/oops")
'file://host/oops'
----------
components: -Interpreter Core
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25403>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com