Dan Sommers wrote: > On Sun, 10 May 2020 00:34:43 -0000 > "Steve Jorgensen" ste...@stevej.name wrote: > > I believe the Python standard library should include > > a means of > > sanitizing a filesystem entry, and this should not be something > > requiring a 3rd party package. > > I'm not disagreeing. > > What I am envisioning is a function (presumably in > > os.path with a signature roughly like > > {{{ > > sanitizepart(name, permissive=False, mode=ESCAPE, system=None) > > }}} > > When permissive is False, characters that are generally > > unsafe are > > rejected. When permissive is True, only path separator > > characters > > are rejected. Generally unsafe characters besides path separators > > would include things like a leading ".", any non-printing character, > > any wildcard, piping and redirection characters, etc. > > Okay, now I'm disagreeing. ;-) > I know what sanitize means (in English and in the technical sense I > believe you intend here), but can you provide some context and actual > use cases? > Sanitize on input so that your application code doesn't "accidentally" > spit out the contents of /etc/shadow? Sanitize on output so that your > code doesn't produce syntactically broken links in an HTML document or > weird results in an xterm? Sanitize in both directions for safe round > tripping to a database server? All of those use cases potentially > require separate handling, especially in terms of quoting and escaping. > For another example, suppose I'm writing a command line utility on a > POSIX system to compute a hash of the contents of a file. There's > nothing wrong with ".profile" as a file name. Why are you rejecting > leading "." characters? What about leading "-"s, or embedded "|"s? > Yes, certain shells and shell commands can make them "difficult" to deal > with in one way or another, but they're not "generally unsafe." > A very, very, very long time ago, we wrote some software for a customer > who liked to "editing" our data files to make minor corrections instead > of using our software. Our solution was to use "illegal" filenames that > the shell rejected, but that an application could access directly > anyway. I guess the point is that "sanitize" can mean different things > to different parts of a system. > Dan
I totally get what you're saying. For the sake of simplicity, I thought that the 2 permissiveness options should be one that only prevents path traversal and one that is extremely conservative, omitting characters that are often safe and appropriate but may be unsafe in some cases. In regard to dot files, those can be safe in some cases, but unsafe in others — writing to configuration files that will be read by shell helpers or editors, for instance. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QQ2FO6ARZD4WM45OPYGBXEGXYQO72PRY/ Code of Conduct: http://python.org/psf/codeofconduct/