I believe the Python standard library should include a means of sanitizing a 
filesystem entry, and this should not be something requiring a 3rd party 
package.

One of reasons I think this should be in the standard lib is because that 
provides a common, simple means for code reviewers and static analysis services 
such as Veracode to recognize that a value is sanitized in an accepted manner.

What I am envisioning is a function (presumably in `os.path` with a signature 
roughly like
{{{
sanitizepart(name, permissive=False, mode=ESCAPE, system=None)
}}}

When `permissive` is `False`, characters that are generally unsafe are 
rejected. When `permissive` is `True`, only path separator characters are 
rejected. Generally unsafe characters besides path separators would include 
things like a leading ".", any non-printing character, any wildcard, piping and 
redirection characters, etc.

The `mode` argument indicates what to do with unacceptable characters. Escape 
them (`ESCAPE`), omit them (`OMIT`) or raise an exception (`RAISE`). This could 
also double as an escape character argument when a string is given. The default 
escape character should probably be "%" (same as URL encoding).

The `system` argument accepts a combination of bit flags indicating what 
operating system's rules to apply, or `None` meaning to use rules for the 
current platform. Systems would probably include `SYS_POSIX`, `SYS_WIN`, and 
`SYS_MISC` where miscellaneous means to enforce rules for all commonly used 
systems. One example of a distinction is that on a POSIX system, backslash 
characters are not path separators, but on Windows, both forward and backward 
slashes are path separators.

{{{
from os import path
from os.path import sanitizepart

print(repr(
    os.path.sanitizepart('/ABC\\QRS%', system=path.SYS_WIN))
# => '%2fABC%5cQRS%%'

    os.path.sanitizepart('/ABC\\QRS%', True, mode=path.STRIP, 
system=path.SYS_POSIX))
# => 'ABC\\QRS%'

    os.path.sanitizepart('../AB&CD*\x01\n', system=path.SYS_POSIX))
# => '%2e.%2fABC%26CD%2a%01%10'

    os.path.sanitizepart('../AB&CD*\x01\n', True, system=path.SYS_POSIX))
# => '..%2eAB&CD*\x01\n'
}}}
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SQH4LPERFLKBLXPDUOVJMV24JBCBUCYO/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to