[Python-ideas] Re: joining paths without worrying about a leading slash

Wes Turner Sun, 27 Jun 2021 18:59:03 -0700

"[Python-ideas] Sanitize filename (path part) 2nd try"
https://mail.python.org/archives/list/python-ideas@python.org/thread/LRIKMG3G4I4YQNK6BTU7MICHT7X67MEF/


"[Python-ideas] Sanitize filename (path part)"
https://mail.python.org/archives/list/python-ideas@python.org/thread/SQH4LPERFLKBLXPDUOVJMV24JBCBUCYO/

```quote
What does sanitizepart do with a leading slash?

assert os.path.join("a", "/b") == "/b"

A new safejoin() or joinsafe() or join(safe='True') could call
sanitizepart() such that:

assert joinsafe("a\n", "/b") == "a\n/b"
```

On Sun, Jun 27, 2021, 17:15 Barry Scott <ba...@barrys-emacs.org> wrote:

>
> > On 27 Jun 2021, at 12:07, Zbigniew Jędrzejewski-Szmek <zbys...@in.waw.pl>
> wrote:
> >
> > [this is a continuation of https://bugs.python.org/issue44452]
> >
> > pathlib.Path() has a concatenation operator "/" that allows the
> > right-hand-side argument to be an absolute path, which causes the
> > left-hand-side argument to be ignored:
> >
> >>>> pathlib.Path('/foo') / '/bar'
> > PosixPath('/bar')
> >>>> pathlib.Path('/var/tmp/instroot') / '/some/path' / '/suffix'
> > PosixPath('/suffix')
> >
> > This follows the precedent set by os.path.join(), and probably makes
> > sense in the scenario of simulating a user typing 'cd' commands in a
> > shell.
> >
> > But it doesn't work nicely in the case of combining paths from
> > two different "namespaces", where we never want to go "up".
> >
> > For example: a web server takes an URL, strips the host, and wants
> > to look up a file:
> > https://example.com/some/path → "/some/path" → /src/www/root +
> /some/path → /src/www/root/some/path
> >
> > or we are constructing a container image and need to refer to a file
> > in the container:
> > <container foo> + /etc/shadow → /var/lib/machines/foo + /etc/shadow →
> /var/lib/machines/foo/etc/shadow
> >
> > To do this kind of operation correctly with pathlib.Path, the user
> > needs to do two operations: verify that the rhs argument contains
> > no '..' [*], and strip leading slashes:
> >
> >>>> lhs = pathlib.Path('/some/namespace/')
> >>>> rhs = '/some/path/to/add'
> >>>> if '..' in pathlib.Path(rhs).parts: raise ValueError
> >>>> path = lhs / rhs.lstrip('/')
> >
> > Those last two lines are rather verbose, non-obvious. Also the .lstrip()
> > operation attaches on the right side, but operates on the left side,
> earlier
> > than the "/", which is overall not very nice.
> >
> > Proposal:
> >
> > add "//"-operator to pathlib.PosixPath() that means "concatenate a rhs
> path
> > that is underneath the lhs". It would disallow paths with '..', and
> concatenate
> > paths as relative to the specified lhs:
> >
> >>>> lhs = pathlib.Path('/some/namespace/')
> >>>> lhs // "a/b/c"
> > PosixPath('/some/namespace/a/b/c')
> >>>> lhs // "/a/b/c"
> > PosixPath('/some/namespace/a/b/c')
> >>>> lhs // "a/../b/c"
> > ValueError: cannot use // with a path with '..' on the right
> >
> > This would be useful for operations on containers, combining paths from
> > namespaces like fs paths and URL components, looking up files
> > underneath an unpacked archive, etc.
> >
> > [*] Why completely disallow '..' ? Components with '..' cannot be
> > correctly resolved without access to the filesystem, because a
> > component may be a symlink, and then "a/b/../." may not be "a/.", but
> > something completely different. Thus, since the goal is to have a path
> > underneath lhs, I think it's best to forbid '..'. In principle '..' at
> > the beginning can be resolved reliably, by simply ignoring it,
> > '/../../../whatever' is the same as '/whatever/'. But it's a tiny
> > corner case, and I think it's better to disallow that too.
>
> There are two ideas here.
>
> 1. Allow Path() to join a pair of absolute paths.
>
> 2. Prevent '..' from escaping into the first absolute path.
>
> For (1) you can do this today:
>
> >>> root=Path('/var/www')
> >>> root / y.relative_to('/')
> PosixPath('/var/www/a/b')
> >>>
>
> I can think if a number of rules that might apply for (2).
> (a) raise an error is there is a '..' or '.' in any path component.
> (b) resolve() '..' and ','  as pathlib already does
>
> - I'm not sure that use of the filesystem is needed to validate the use of
> .. is always needed.
>
> >>> y=Path('/a/b/../v.html')
> >>> y.relative_to('/')
> PosixPath('a/b/../v.html')
> >>> root / y.relative_to('/')
> PosixPath('/var/www/a/b/../v.html')
> >>> root / y.resolve().relative_to('/')
> PosixPath('/var/www/a/v.html')
>
> and show that no escape to root happens:
>
> >>> y=Path('/../a//v.html')
> >>> root / y.resolve().relative_to('/')
> PosixPath('/var/www/a/v.html')
> >>>
>
> Barry
>
> >
> > Zbyszek
> > _______________________________________________
> > Python-ideas mailing list -- python-ideas@python.org
> > To unsubscribe send an email to python-ideas-le...@python.org
> > https://mail.python.org/mailman3/lists/python-ideas.python.org/
> > Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/IXYPKVINLD57BOV6VHU4U4ZJCQCQPAHT/
> > Code of Conduct: http://python.org/psf/codeofconduct/
>
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/6HR4IAUAUIQXK5SJAWKVFVOFZ374C4W3/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3YR26TH2TTNMVBIVO26SUQVAZENFOYM5/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: joining paths without worrying about a leading slash

Reply via email to