Phillip J. Eby wrote: > At 01:56 AM 11/4/2006 +0100, Andrew Dalke wrote: > >>os.join assumes the base is a directory >>name when used in a join: "inserting '/' as needed" while RFC >>1808 says >> >> The last segment of the base URL's path (anything >> following the rightmost slash "/", or the entire path if no >> slash is present) is removed >> >>Is my intuition wrong in thinking those should be the same? > > > Yes. :) > > Path combining and URL absolutization(?) are inherently different > operations with only superficial similarities. One reason for this is that > a trailing / on a URL has an actual meaning, whereas in filesystem paths a > trailing / is an aberration and likely an actual error. > > The path combining operation says, "treat the following as a subpath of the > base path, unless it is absolute". The URL normalization operation says, > "treat the following as a subpath of the location the base URL is > *contained in*". > > Because of this, os.path.join assumes a path with a trailing separator is > equivalent to a path without one, since that is the only reasonable way to > interpret treating the joined path as a subpath of the base path. > > But for a URL join, the path /foo and the path /foo/ are not only > *different paths* referring to distinct objects, but the operation wants to > refer to the *container* of the referenced object. /foo might refer to a > directory, while /foo/ refers to some default content (e.g. > index.html). This is actually why Apache normally redirects you from /foo > to /foo/ before it serves up the index.html; relative URLs based on a base > URL of /foo won't work right. > > The URL approach is designed to make peer-to-peer linking in a given > directory convenient. Instead of referring to './foo.html' (as one would > have to do with filenames, you can simply refer to 'foo.html'. But the > cost of saving those characters in every link is that joining always takes > place on the parent, never the tail-end. Thus directory URLs normally end > in a trailing /, and most tools tend to automatically redirect when > somebody leaves it off. (Because otherwise the links would be wrong.) > Having said this, Andrew *did* demonstrate quite convincingly that the current urljoin has some fairly egregious directory traversal glitches. Is it really right to punt obvious gotchas like
>>>urlparse.urljoin("http://blah.com/a/b/c", "../../../../") 'http://blah.com/../../' >>> to the server? regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com