Steve Jorgensen wrote: > Steve Jorgensen wrote: > > I believe the Python standard library should include > > a means of sanitizing a filesystem > > entry, and this should not be something requiring a 3rd party package. > > One of reasons I think this should be in the standard lib is because that > > provides a > > common, simple means for code reviewers and static analysis services such > > as Veracode to > > recognize that a value is sanitized in an accepted manner. > > What I am envisioning is a function (presumably in os.path with a > > signature roughly like > > {{{ > > sanitizepart(name, permissive=False, mode=ESCAPE, system=None) > > }}} > > When permissive is False, characters that are generally > > unsafe are rejected. When permissive is True, only path > > separator characters are rejected. Generally unsafe characters besides path > > separators > > would include things like a leading ".", any non-printing character, any > > wildcard, piping > > and redirection characters, etc. > > The mode argument indicates what to do with unacceptable characters. > > Escape them (ESCAPE), omit them (OMIT) or raise an exception > > (RAISE). This could also double as an escape character argument when a > > string > > is given. The default escape character should probably be "%" (same as URL > > encoding). > > The system argument accepts a combination of bit flags indicating what > > operating system's rules to apply, or None meaning to use rules for the > > current platform. Systems would probably include SYS_POSIX, > > SYS_WIN, and SYS_MISC where miscellaneous means to enforce rules > > for all commonly used systems. One example of a distinction is that on a > > POSIX system, > > backslash characters are not path separators, but on Windows, both forward > > and backward > > slashes are path separators. > > {{{ > > from os import path > > from os.path import sanitizepart > > print(repr( > > os.path.sanitizepart('/ABC\QRS%', system=path.SYS_WIN)) > > # => '%2fABC%5cQRS%%' > > os.path.sanitizepart('/ABC\QRS%', True, mode=path.STRIP, > > system=path.SYS_POSIX)) > > # => 'ABC\QRS%' > > os.path.sanitizepart('../AB&CD*\x01\n', system=path.SYS_POSIX)) > > # => '%2e.%2fABC%26CD%2a%01%10' > > os.path.sanitizepart('../AB&CD*\x01\n', True, system=path.SYS_POSIX)) > > # => '..%2eAB&CD*\x01\n' > > }}} > > Existing work: > https://pypi.org/project/pathvalidate/#sanitize-a-filename
More existing work: * https://pypi.org/project/sanitize-filename/ * http://detox.sourceforge.net/ * https://sourceforge.net/p/glindra/news/2005/08/glindra-rename--lower--portable/ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ITEHIWIFNGM5WOMOC5UAHKQVMLVIBR6Z/ Code of Conduct: http://python.org/psf/codeofconduct/