(Is it almost always better to just use a hash of the provided filename (maybe in a p/a/ir/tree234 implementation to avoid the max files in a directory limit of whichever filesystem) instead of the user-supplied filename string?)
On Mon, May 11, 2020 at 4:48 PM Wes Turner <wes.tur...@gmail.com> wrote: > FWIW, here are some of the CWE codes for related > vulnerabilities/weaknesses in implementations: > > CWE-73: External Control of File Name or Path > https://cwe.mitre.org/data/definitions/73.html > > CWE-707: Improper Neutralization > https://cwe.mitre.org/data/definitions/707.html > > CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path > Traversal') > https://cwe.mitre.org/data/definitions/22.html > > Because this behavior of os.path.join is documented, it's not a vuln in > Python, it's a vuln in every downstream component that (1) uses > os.path.join with user supplied input; and that (2) doesn't strip a leading > '/' from path parts before joining them with os.path.join. > > https://docs.python.org/3/library/os.path.html#os.path.join > > [...] If a component is an absolute path, all previous components are > thrown away and joining continues from the absolute path component. > > [quoting from "part 2"] > What does sanitizepart do with a leading slash? > > assert os.path.join("a", "/b") == "/b" > > A new safejoin() or joinsafe() or join(safe='True') could call > sanitizepart() such that: > > assert joinsafe("a\n", "/b") == "a\\n/b" > > On Sun, May 10, 2020 at 5:36 AM Steve Jorgensen <ste...@stevej.name> > wrote: > >> Steve Jorgensen wrote: >> > Steve Jorgensen wrote: >> > > Andrew Barnert wrote: >> > > On May 9, 2020, at 17:35, Steve Jorgensen >> > > ste...@stevej.name wrote: >> > > I believe the Python standard library should >> > > include >> > > a means of sanitizing a filesystem entry, and this should not be >> something requiring a >> > > 3rd >> > > party package. >> > > One of reasons I think this should be in the standard lib is because >> that provides a >> > > common, simple means for code reviewers and static analysis services >> such as Veracode to >> > > recognize that a value is sanitized in an accepted manner. >> > > This does seem like a good idea. People who do this themselves get it >> wrong all >> > > the time, occasionally with disastrous consequences, so if Python can >> solve that, that >> > > would be great. >> > > But, at least historically, this has been more complicated than what >> you’re suggesting >> > > here. For example, don’t you have to catch things like directories >> named “Con” or files >> > > whose 8.3 representation has “CON” as the 8 part? I don’t think you >> can hang an entire >> > > Windows system by abusing those anymore, but you can still produce >> filenames that some >> > > APIs, and some tools (possibly including Explorer, cmd, powershell, >> Cygwin, mingw/native >> > > shells, Python itself…) can’t access (or can only access if the user >> manually specified a >> > > .\ absolute path, or whatever). >> > > Yes. I am aware of some of the unsafe names in DOS and older Windows. >> As I >> > > mentioned in my other reply, there is a distinction between the ones >> that are merely >> > > invalid and those that are actually unsafe. In researching existing >> Linux tools just now, >> > > I was reminded that a leading dash is frequently unsafe because many >> tools will treat an >> > > argument starting with dash as an option argument. >> > > Is there an established algorithm/rule that lots of >> > > people in the industry trust that >> > > Python can just reference, instead of having to research or invent >> it? Because otherwise, >> > > we run the risk of making things worse instead of better. >> > > An excellent point! I just started digging into that and found >> references to >> > > detox and Glindra. Neither of those seems to be well maintained >> though. The documentation >> > > pages for Glindra no longer exist and detox is not in standard >> package repositories for >> > > CentOS later than 6 (and only in EPEL for that. Still digging. >> > > Extremely apropos to the question of what charters might be >> problematic >> > and/or unsafe: >> https://dwheeler.com/essays/fixing-unix-linux-filenames.html >> >> That article links to another by the same author that is specific to >> vulnerabilities caused by file names. >> https://dwheeler.com/secure-programs/Secure-Programs-HOWTO/file-names.html >> _______________________________________________ >> Python-ideas mailing list -- python-ideas@python.org >> To unsubscribe send an email to python-ideas-le...@python.org >> https://mail.python.org/mailman3/lists/python-ideas.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-ideas@python.org/message/FDZOXS2BNZHJ4XAG7WU7BO3AA7KF6WWK/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/73IJZSICPJKYN2QL6TOUR5S5VEXZOTK5/ Code of Conduct: http://python.org/psf/codeofconduct/