[this is a continuation of https://bugs.python.org/issue44452]

pathlib.Path() has a concatenation operator "/" that allows the
right-hand-side argument to be an absolute path, which causes the
left-hand-side argument to be ignored:

>>> pathlib.Path('/foo') / '/bar'
PosixPath('/bar')
>>> pathlib.Path('/var/tmp/instroot') / '/some/path' / '/suffix'
PosixPath('/suffix')

This follows the precedent set by os.path.join(), and probably makes
sense in the scenario of simulating a user typing 'cd' commands in a
shell.

But it doesn't work nicely in the case of combining paths from
two different "namespaces", where we never want to go "up".

For example: a web server takes an URL, strips the host, and wants
to look up a file:
https://example.com/some/path → "/some/path" → /src/www/root + /some/path → 
/src/www/root/some/path

or we are constructing a container image and need to refer to a file
in the container:
<container foo> + /etc/shadow → /var/lib/machines/foo + /etc/shadow → 
/var/lib/machines/foo/etc/shadow

To do this kind of operation correctly with pathlib.Path, the user
needs to do two operations: verify that the rhs argument contains
no '..' [*], and strip leading slashes:

>>> lhs = pathlib.Path('/some/namespace/')
>>> rhs = '/some/path/to/add'
>>> if '..' in pathlib.Path(rhs).parts: raise ValueError
>>> path = lhs / rhs.lstrip('/')

Those last two lines are rather verbose, non-obvious. Also the .lstrip()
operation attaches on the right side, but operates on the left side, earlier
than the "/", which is overall not very nice.

Proposal: 

add "//"-operator to pathlib.PosixPath() that means "concatenate a rhs path
that is underneath the lhs". It would disallow paths with '..', and concatenate
paths as relative to the specified lhs:

>>> lhs = pathlib.Path('/some/namespace/')
>>> lhs // "a/b/c"
PosixPath('/some/namespace/a/b/c')
>>> lhs // "/a/b/c"
PosixPath('/some/namespace/a/b/c')
>>> lhs // "a/../b/c"
ValueError: cannot use // with a path with '..' on the right

This would be useful for operations on containers, combining paths from
namespaces like fs paths and URL components, looking up files
underneath an unpacked archive, etc.

[*] Why completely disallow '..' ? Components with '..' cannot be
correctly resolved without access to the filesystem, because a
component may be a symlink, and then "a/b/../." may not be "a/.", but
something completely different. Thus, since the goal is to have a path
underneath lhs, I think it's best to forbid '..'. In principle '..' at
the beginning can be resolved reliably, by simply ignoring it,
'/../../../whatever' is the same as '/whatever/'. But it's a tiny
corner case, and I think it's better to disallow that too.

Zbyszek
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IXYPKVINLD57BOV6VHU4U4ZJCQCQPAHT/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to