On Sun, Jun 27, 2021 at 10:13:25PM +0100, Barry Scott wrote:
> 
> > On 27 Jun 2021, at 12:07, Zbigniew Jędrzejewski-Szmek <zbys...@in.waw.pl> 
> > wrote:
> > 
> > [this is a continuation of https://bugs.python.org/issue44452]
> > 
> > pathlib.Path() has a concatenation operator "/" that allows the
> > right-hand-side argument to be an absolute path, which causes the
> > left-hand-side argument to be ignored:
> > 
> >>>> pathlib.Path('/foo') / '/bar'
> > PosixPath('/bar')
> >>>> pathlib.Path('/var/tmp/instroot') / '/some/path' / '/suffix'
> > PosixPath('/suffix')
> > 
> > This follows the precedent set by os.path.join(), and probably makes
> > sense in the scenario of simulating a user typing 'cd' commands in a
> > shell.
> > 
> > But it doesn't work nicely in the case of combining paths from
> > two different "namespaces", where we never want to go "up".
> > 
> > For example: a web server takes an URL, strips the host, and wants
> > to look up a file:
> > https://example.com/some/path → "/some/path" → /src/www/root + /some/path → 
> > /src/www/root/some/path
> > 
> > or we are constructing a container image and need to refer to a file
> > in the container:
> > <container foo> + /etc/shadow → /var/lib/machines/foo + /etc/shadow → 
> > /var/lib/machines/foo/etc/shadow
> > 
> > To do this kind of operation correctly with pathlib.Path, the user
> > needs to do two operations: verify that the rhs argument contains
> > no '..' [*], and strip leading slashes:
> > 
> >>>> lhs = pathlib.Path('/some/namespace/')
> >>>> rhs = '/some/path/to/add'
> >>>> if '..' in pathlib.Path(rhs).parts: raise ValueError
> >>>> path = lhs / rhs.lstrip('/')
> > 
> > Those last two lines are rather verbose, non-obvious. Also the .lstrip()
> > operation attaches on the right side, but operates on the left side, earlier
> > than the "/", which is overall not very nice.
> > 
> > Proposal: 
> > 
> > add "//"-operator to pathlib.PosixPath() that means "concatenate a rhs path
> > that is underneath the lhs". It would disallow paths with '..', and 
> > concatenate
> > paths as relative to the specified lhs:
> > 
> >>>> lhs = pathlib.Path('/some/namespace/')
> >>>> lhs // "a/b/c"
> > PosixPath('/some/namespace/a/b/c')
> >>>> lhs // "/a/b/c"
> > PosixPath('/some/namespace/a/b/c')
> >>>> lhs // "a/../b/c"
> > ValueError: cannot use // with a path with '..' on the right
> > 
> > This would be useful for operations on containers, combining paths from
> > namespaces like fs paths and URL components, looking up files
> > underneath an unpacked archive, etc.
> > 
> > [*] Why completely disallow '..' ? Components with '..' cannot be
> > correctly resolved without access to the filesystem, because a
> > component may be a symlink, and then "a/b/../." may not be "a/.", but
> > something completely different. Thus, since the goal is to have a path
> > underneath lhs, I think it's best to forbid '..'. In principle '..' at
> > the beginning can be resolved reliably, by simply ignoring it,
> > '/../../../whatever' is the same as '/whatever/'. But it's a tiny
> > corner case, and I think it's better to disallow that too.
> 
> There are two ideas here.
> 
> 1. Allow Path() to join a pair of absolute paths.
> 
> 2. Prevent '..' from escaping into the first absolute path.
> 
> For (1) you can do this today:
> 
> >>> root=Path('/var/www')
> >>> root / y.relative_to('/')
> PosixPath('/var/www/a/b')

I don't like it because:

1. 'y' must be a Path object too. The most natural use of such
    concatenation is to have some "root" object which is converted
    to Path() once, and then passed around to functions. So this
    would have to look more like this:

      root / pathlib.Path(y).relative_to('/')

2. It's another non-obvious and verbose workaround

3. It doesn't work in all cases:

   >>> pathlib.Path('///a/b/c').relative_to('/')
   PosixPath('a/b/c')
   >>> pathlib.Path('//a/b/c').relative_to('/')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/usr/lib64/python3.9/pathlib.py", line 929, in relative_to
       raise ValueError("{!r} is not in the subpath of {!r}"
   ValueError: '//a/b/c' is not in the subpath of '/' OR one path is relative 
and the other is absolute.
   >>> pathlib.Path('/a/b/c').relative_to('/')
   PosixPath('a/b/c')

   This is POSIX, btw, see 
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13.

4. It also doesn't handle '..' in any way:

   >>> pathlib.Path('/../foo').relative_to('/')
   PosixPath('../foo')

> I can think if a number of rules that might apply for (2).
> (a) raise an error is there is a '..' or '.' in any path component.

Let's talk about '.' first. '.' is not a problem at all. It simply
*doesn't matter* for any correctness or security rules that are normally
used. One might want to remove '.' from the string before printing
as a prettification operation, but that's all.

> (b) resolve() '..' and ','  as pathlib already does
> 
> - I'm not sure that use of the filesystem is needed to validate the use of .. 
> is always needed.
[...]
> no escape to root happens:

The main reason is not "not escaping to the root", but resolving the
path correctly, in the sense of resolving it the same as kernel would,
if we were to access the file:

$ ls -l /bin
/bin -> usr/bin/
$ ls -ld /bin/../riscv64-linux-gnu
drwxr-xr-x 4 root root 4096 Jan 26 05:11 /bin/../riscv64-linux-gnu/
$ ls -ld /riscv64-linux-gnu
ls: cannot access '/riscv64-linux-gnu': No such file or directory

As you can see, you cannot just collapse '/bin/../' to '/', because
'/bin/../' is '/usr' and '/bin/../riscv64-linux-gnu' is
'/usr/riscv64-linux-gnu'.

Similarly, '/var/lib/machines/rawhide/bin/../riscv64-linux-gnu/'
is *not* '/var/lib/machines/rawhide/riscv64-linux-gnu/'.

But actually, it also allows "escaping":
>>> pathlib.Path('/root') / pathlib.Path('/../foo').relative_to('/')
PosixPath('/root/../foo')
('/root/foo' is the what we would need instead.)

(The operation of resolving a path with '..'-components relative to
some root comes up quite often when handling containers. It's a royal
pain, because you need to look up one component, then if it is a
symlink, resursively look for any '..'-components in the symlink
target, and at each step normalize paths relative to the temporary
root. Example implementation:
https://github.com/systemd/systemd/blob/main/src/basic/fs-util.c#L771 .)

Zbyszek
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/K4URGRGOFAHUPLJMTDFZYZ5P7WMWMRGP/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to