[EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]:
> According to RFC 2396[1] section 5.2: > > g) If the resulting buffer string still begins with one or more > complete path segments of "..", then the reference is > considered to be in error. Implementations may handle this > error by retaining these components in the resolved path (i.e., > treating them as part of the final URI), by removing them from > the resolved path (i.e., discarding relative levels above the > root), or by avoiding traversal of the reference. > > If I read this right, it explicitly allows the urlparse.urljoin behavior > ("handle this error by retaining these components in the resolved path"). > Yes, the urljoin behaviour is explicitly allowed, however it is not the most commonly implemented permitted behaviour. Both IE and Mozilla/Firefox handle this error by stripping the spurious .. elements from the front of the path. Apache, and I hope other web servers, work by the third permitted method, i.e. rejecting requests to these invalid urls. The net effect of this is that on some sites using a Python spider (e.g. webchecker.py) will produce a large number of error messages for links which browsers will actually resolve successfully. (At least that's when I first noticed this particular problem). Depending on your reasons for spidering a site this can be either a good thing or an annoyance. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com