"Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >> Is "urllib" wrong? > > I can't see how. HTTP 1.1 says that the parameter to the GET > request should be an abs_path; RFC 2396 says that > /../acatalog/shop.html is indeed an abs_path, as .. is a valid > segment. That RFC also has a section on relative identifiers > and normalization; it defines what .. means *in a relative path*. > > Section 4 is explicit about .. in absolute URIs: > # The syntax for relative URI is a shortened form of that for absolute > # URI, where some prefix of the URI is missing and certain path > # components ("." and "..") have a special meaning when, and only when, > # interpreting a relative path. > > Notice the "and only when": the browsers who modify above > URL before sending it seem to be in clear violation of > RFC 2396.
Section 5.2 is also relevant here. In particular: > g) If the resulting buffer string still begins with one or more > complete path segments of "..", then the reference is > considered to be in error. Implementations may handle this > error by retaining these components in the resolved path (i.e., > treating them as part of the final URI), by removing them from > the resolved path (i.e., discarding relative levels above the > root), or by avoiding traversal of the reference. The common practice seems to be for client-side implementations to handle this using option 2 (removing them) and servers to use option 3 (avoiding traversal of the reference). urllib uses option 1 which is also correct but not as useful as it might be. -- http://mail.python.org/mailman/listinfo/python-list