On Sun, 10 May 2020 00:34:43 -0000
"Steve Jorgensen" <ste...@stevej.name> wrote:

> I believe the Python standard library should include a means of
> sanitizing a filesystem entry, and this should not be something
> requiring a 3rd party package.

I'm not disagreeing.

> What I am envisioning is a function (presumably in `os.path` with a signature 
> roughly like
> {{{
> sanitizepart(name, permissive=False, mode=ESCAPE, system=None)
> }}}

> When `permissive` is `False`, characters that are generally unsafe are
> rejected. When `permissive` is `True`, only path separator characters
> are rejected. Generally unsafe characters besides path separators
> would include things like a leading ".", any non-printing character,
> any wildcard, piping and redirection characters, etc.

Okay, now I'm disagreeing.  ;-)

I know what sanitize means (in English and in the technical sense I
believe you intend here), but can you provide some context and actual
use cases?

Sanitize on input so that your application code doesn't "accidentally"
spit out the contents of /etc/shadow?  Sanitize on output so that your
code doesn't produce syntactically broken links in an HTML document or
weird results in an xterm?  Sanitize in both directions for safe round
tripping to a database server?  All of those use cases potentially
require separate handling, especially in terms of quoting and escaping.

For another example, suppose I'm writing a command line utility on a
POSIX system to compute a hash of the contents of a file.  There's
nothing wrong with ".profile" as a file name.  Why are you rejecting
leading "."  characters?  What about leading "-"s, or embedded "|"s?
Yes, certain shells and shell commands can make them "difficult" to deal
with in one way or another, but they're not "generally unsafe."

A very, very, very long time ago, we wrote some software for a customer
who liked to "editing" our data files to make minor corrections instead
of using our software.  Our solution was to use "illegal" filenames that
the shell rejected, but that an application could access directly
anyway.  I guess the point is that "sanitize" can mean different things
to different parts of a system.

Dan

-- 
“Atoms are not things.” – Werner Heisenberg
Dan Sommers, http://www.tombstonezero.net/dan
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KOIAQZ4GOZD4TBIBGBE7XKXHVDIFRNZX/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to