"[Python-ideas] Sanitize filename (path part) 2nd try" https://mail.python.org/archives/list/python-ideas@python.org/thread/LRIKMG3G4I4YQNK6BTU7MICHT7X67MEF/
"[Python-ideas] Sanitize filename (path part)" https://mail.python.org/archives/list/python-ideas@python.org/thread/SQH4LPERFLKBLXPDUOVJMV24JBCBUCYO/ ```quote What does sanitizepart do with a leading slash? assert os.path.join("a", "/b") == "/b" A new safejoin() or joinsafe() or join(safe='True') could call sanitizepart() such that: assert joinsafe("a\n", "/b") == "a\n/b" ``` On Sun, Jun 27, 2021, 17:15 Barry Scott <ba...@barrys-emacs.org> wrote: > > > On 27 Jun 2021, at 12:07, Zbigniew Jędrzejewski-Szmek <zbys...@in.waw.pl> > wrote: > > > > [this is a continuation of https://bugs.python.org/issue44452] > > > > pathlib.Path() has a concatenation operator "/" that allows the > > right-hand-side argument to be an absolute path, which causes the > > left-hand-side argument to be ignored: > > > >>>> pathlib.Path('/foo') / '/bar' > > PosixPath('/bar') > >>>> pathlib.Path('/var/tmp/instroot') / '/some/path' / '/suffix' > > PosixPath('/suffix') > > > > This follows the precedent set by os.path.join(), and probably makes > > sense in the scenario of simulating a user typing 'cd' commands in a > > shell. > > > > But it doesn't work nicely in the case of combining paths from > > two different "namespaces", where we never want to go "up". > > > > For example: a web server takes an URL, strips the host, and wants > > to look up a file: > > https://example.com/some/path → "/some/path" → /src/www/root + > /some/path → /src/www/root/some/path > > > > or we are constructing a container image and need to refer to a file > > in the container: > > <container foo> + /etc/shadow → /var/lib/machines/foo + /etc/shadow → > /var/lib/machines/foo/etc/shadow > > > > To do this kind of operation correctly with pathlib.Path, the user > > needs to do two operations: verify that the rhs argument contains > > no '..' [*], and strip leading slashes: > > > >>>> lhs = pathlib.Path('/some/namespace/') > >>>> rhs = '/some/path/to/add' > >>>> if '..' in pathlib.Path(rhs).parts: raise ValueError > >>>> path = lhs / rhs.lstrip('/') > > > > Those last two lines are rather verbose, non-obvious. Also the .lstrip() > > operation attaches on the right side, but operates on the left side, > earlier > > than the "/", which is overall not very nice. > > > > Proposal: > > > > add "//"-operator to pathlib.PosixPath() that means "concatenate a rhs > path > > that is underneath the lhs". It would disallow paths with '..', and > concatenate > > paths as relative to the specified lhs: > > > >>>> lhs = pathlib.Path('/some/namespace/') > >>>> lhs // "a/b/c" > > PosixPath('/some/namespace/a/b/c') > >>>> lhs // "/a/b/c" > > PosixPath('/some/namespace/a/b/c') > >>>> lhs // "a/../b/c" > > ValueError: cannot use // with a path with '..' on the right > > > > This would be useful for operations on containers, combining paths from > > namespaces like fs paths and URL components, looking up files > > underneath an unpacked archive, etc. > > > > [*] Why completely disallow '..' ? Components with '..' cannot be > > correctly resolved without access to the filesystem, because a > > component may be a symlink, and then "a/b/../." may not be "a/.", but > > something completely different. Thus, since the goal is to have a path > > underneath lhs, I think it's best to forbid '..'. In principle '..' at > > the beginning can be resolved reliably, by simply ignoring it, > > '/../../../whatever' is the same as '/whatever/'. But it's a tiny > > corner case, and I think it's better to disallow that too. > > There are two ideas here. > > 1. Allow Path() to join a pair of absolute paths. > > 2. Prevent '..' from escaping into the first absolute path. > > For (1) you can do this today: > > >>> root=Path('/var/www') > >>> root / y.relative_to('/') > PosixPath('/var/www/a/b') > >>> > > I can think if a number of rules that might apply for (2). > (a) raise an error is there is a '..' or '.' in any path component. > (b) resolve() '..' and ',' as pathlib already does > > - I'm not sure that use of the filesystem is needed to validate the use of > .. is always needed. > > >>> y=Path('/a/b/../v.html') > >>> y.relative_to('/') > PosixPath('a/b/../v.html') > >>> root / y.relative_to('/') > PosixPath('/var/www/a/b/../v.html') > >>> root / y.resolve().relative_to('/') > PosixPath('/var/www/a/v.html') > > and show that no escape to root happens: > > >>> y=Path('/../a//v.html') > >>> root / y.resolve().relative_to('/') > PosixPath('/var/www/a/v.html') > >>> > > Barry > > > > > Zbyszek > > _______________________________________________ > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-le...@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/IXYPKVINLD57BOV6VHU4U4ZJCQCQPAHT/ > > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/6HR4IAUAUIQXK5SJAWKVFVOFZ374C4W3/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3YR26TH2TTNMVBIVO26SUQVAZENFOYM5/ Code of Conduct: http://python.org/psf/codeofconduct/