Steve Jorgensen wrote: > Andrew Barnert wrote: > > On May 9, 2020, at 17:35, Steve Jorgensen > > ste...@stevej.name wrote: > > I believe the Python standard library should > > include > > a means of sanitizing a filesystem entry, and this should not be something > > requiring a > > 3rd > > party package. > > One of reasons I think this should be in the standard lib is because that > > provides a > > common, simple means for code reviewers and static analysis services such > > as Veracode to > > recognize that a value is sanitized in an accepted manner. > > This does seem like a good idea. People who do this themselves get it wrong > > all > > the time, occasionally with disastrous consequences, so if Python can solve > > that, that > > would be great. > > But, at least historically, this has been more complicated than what you’re > > suggesting > > here. For example, don’t you have to catch things like directories named > > “Con” or files > > whose 8.3 representation has “CON” as the 8 part? I don’t think you can > > hang an entire > > Windows system by abusing those anymore, but you can still produce > > filenames that some > > APIs, and some tools (possibly including Explorer, cmd, powershell, Cygwin, > > mingw/native > > shells, Python itself…) can’t access (or can only access if the user > > manually specified a > > .\ absolute path, or whatever). > > Yes. I am aware of some of the unsafe names in DOS and older Windows. As I > mentioned in my other reply, there is a distinction between the ones that are > merely > invalid and those that are actually unsafe. In researching existing Linux > tools just now, > I was reminded that a leading dash is frequently unsafe because many tools > will treat an > argument starting with dash as an option argument. > > Is there an established algorithm/rule that lots of > > people in the industry trust that > > Python can just reference, instead of having to research or invent it? > > Because otherwise, > > we run the risk of making things worse instead of better. > > An excellent point! I just started digging into that and found references to > detox and Glindra. Neither of those seems to be well maintained though. The > documentation > pages for Glindra no longer exist and detox is not in standard package > repositories for > CentOS later than 6 (and only in EPEL for that. Still digging.
Extremely apropos to the question of what charters might be problematic and/or unsafe: https://dwheeler.com/essays/fixing-unix-linux-filenames.html _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EDJQA7SDUWEHJ53GYXIGX2HPTU3JEM6X/ Code of Conduct: http://python.org/psf/codeofconduct/