Dan Sommers wrote:
> On Sun, 10 May 2020 00:34:43 -0000
> "Steve Jorgensen" ste...@stevej.name wrote:
> > I believe the Python standard library should include
> > a means of
> > sanitizing a filesystem entry, and this should not be something
> > requiring a 3rd party package.
> > I'm not disagreeing.
> > What I am envisioning is a function (presumably in
> > os.path with a signature roughly like
> > {{{
> > sanitizepart(name, permissive=False, mode=ESCAPE, system=None)
> > }}}
> > When permissive is False, characters that are generally
> > unsafe are
> > rejected. When permissive is True, only path separator
> > characters
> > are rejected. Generally unsafe characters besides path separators
> > would include things like a leading ".", any non-printing character,
> > any wildcard, piping and redirection characters, etc.
> > Okay, now I'm disagreeing.  ;-)
> I know what sanitize means (in English and in the technical sense I
> believe you intend here), but can you provide some context and actual
> use cases?
> Sanitize on input so that your application code doesn't "accidentally"
> spit out the contents of /etc/shadow?  Sanitize on output so that your
> code doesn't produce syntactically broken links in an HTML document or
> weird results in an xterm?  Sanitize in both directions for safe round
> tripping to a database server?  All of those use cases potentially
> require separate handling, especially in terms of quoting and escaping.
> For another example, suppose I'm writing a command line utility on a
> POSIX system to compute a hash of the contents of a file.  There's
> nothing wrong with ".profile" as a file name.  Why are you rejecting
> leading "."  characters?  What about leading "-"s, or embedded "|"s?
> Yes, certain shells and shell commands can make them "difficult" to deal
> with in one way or another, but they're not "generally unsafe."
> A very, very, very long time ago, we wrote some software for a customer
> who liked to "editing" our data files to make minor corrections instead
> of using our software.  Our solution was to use "illegal" filenames that
> the shell rejected, but that an application could access directly
> anyway.  I guess the point is that "sanitize" can mean different things
> to different parts of a system.
> Dan

I totally get what you're saying. For the sake of simplicity, I thought that 
the 2 permissiveness options should be one that only prevents path traversal 
and one that is extremely conservative, omitting characters that are often safe 
and appropriate but may be unsafe in some cases.

In regard to dot files, those can be safe in some cases, but unsafe in others — 
writing to configuration files that will be read by shell helpers or editors, 
for instance.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QQ2FO6ARZD4WM45OPYGBXEGXYQO72PRY/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to