(Is it almost always better to just use a hash of the provided filename
(maybe in a p/a/ir/tree234 implementation to avoid the max files in a
directory limit of whichever filesystem) instead of the user-supplied
filename string?)

On Mon, May 11, 2020 at 4:48 PM Wes Turner <wes.tur...@gmail.com> wrote:

> FWIW, here are some of the CWE codes for related
> vulnerabilities/weaknesses in implementations:
>
> CWE-73: External Control of File Name or Path
> https://cwe.mitre.org/data/definitions/73.html
>
> CWE-707: Improper Neutralization
> https://cwe.mitre.org/data/definitions/707.html
>
> CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path
> Traversal')
> https://cwe.mitre.org/data/definitions/22.html
>
> Because this behavior of os.path.join is documented, it's not a vuln in
> Python, it's a vuln in every downstream component that (1) uses
> os.path.join with user supplied input; and that (2) doesn't strip a leading
> '/' from path parts before joining them with os.path.join.
>
> https://docs.python.org/3/library/os.path.html#os.path.join
> > [...] If a component is an absolute path, all previous components are
> thrown away and joining continues from the absolute path component.
>
> [quoting from "part 2"]
> What does sanitizepart do with a leading slash?
>
> assert os.path.join("a", "/b") == "/b"
>
> A new safejoin() or joinsafe() or join(safe='True') could call
> sanitizepart() such that:
>
> assert joinsafe("a\n", "/b") == "a\\n/b"
>
> On Sun, May 10, 2020 at 5:36 AM Steve Jorgensen <ste...@stevej.name>
> wrote:
>
>> Steve Jorgensen wrote:
>> > Steve Jorgensen wrote:
>> > > Andrew Barnert wrote:
>> > > On May 9, 2020, at 17:35, Steve Jorgensen
>> > > ste...@stevej.name wrote:
>> > > I believe the Python standard library should
>> > > include
>> > > a means of sanitizing a filesystem entry, and this should not be
>> something requiring a
>> > > 3rd
>> > > party package.
>> > > One of reasons I think this should be in the standard lib is because
>> that provides a
>> > > common, simple means for code reviewers and static analysis services
>> such as Veracode to
>> > > recognize that a value is sanitized in an accepted manner.
>> > > This does seem like a good idea. People who do this themselves get it
>> wrong all
>> > > the time, occasionally with disastrous consequences, so if Python can
>> solve that, that
>> > > would be great.
>> > > But, at least historically, this has been more complicated than what
>> you’re suggesting
>> > > here. For example, don’t you have to catch things like directories
>> named “Con” or files
>> > > whose 8.3 representation has “CON” as the 8 part? I don’t think you
>> can hang an entire
>> > > Windows system by abusing those anymore, but you can still produce
>> filenames that some
>> > > APIs, and some tools (possibly including Explorer, cmd, powershell,
>> Cygwin, mingw/native
>> > > shells, Python itself…) can’t access (or can only access if the user
>> manually specified a
>> > > .\ absolute path, or whatever).
>> > > Yes. I am aware of some of the unsafe names in DOS and older Windows.
>> As I
>> > > mentioned in my other reply, there is a distinction between the ones
>> that are merely
>> > > invalid and those that are actually unsafe. In researching existing
>> Linux tools just now,
>> > > I was reminded that a leading dash is frequently unsafe because many
>> tools will treat an
>> > > argument starting with dash as an option argument.
>> > > Is there an established algorithm/rule that lots of
>> > > people in the industry trust that
>> > > Python can just reference, instead of having to research or invent
>> it? Because otherwise,
>> > > we run the risk of making things worse instead of better.
>> > > An excellent point! I just started digging into that and found
>> references to
>> > > detox and Glindra. Neither of those seems to be well maintained
>> though. The documentation
>> > > pages for Glindra no longer exist and detox is not in standard
>> package repositories for
>> > > CentOS later than 6 (and only in EPEL for that. Still digging.
>> > > Extremely apropos to the question of what charters might be
>> problematic
>> > and/or unsafe:
>> https://dwheeler.com/essays/fixing-unix-linux-filenames.html
>>
>> That article links to another by the same author that is specific to
>> vulnerabilities caused by file names.
>> https://dwheeler.com/secure-programs/Secure-Programs-HOWTO/file-names.html
>> _______________________________________________
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/FDZOXS2BNZHJ4XAG7WU7BO3AA7KF6WWK/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/73IJZSICPJKYN2QL6TOUR5S5VEXZOTK5/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to