> On 10 May 2020, at 01:34, Steve Jorgensen <ste...@stevej.name> wrote:
>
> I believe the Python standard library should include a means of sanitizing a
> filesystem entry, and this should not be something requiring a 3rd party
> package.
snip
I found that I needed to have code that could tell me if a filename was valid
for the OS I'm on.
I'm not sure where sanitising would be useful, if in valid I ask the user to
fix with suitable feedback
in my UI.
There is more than one problem to address.
1. Is the string valid as the path to a filename on this OS and a particular
file system?
2. Does this valid path refer to a device and not a file?
3. Is this path meeting the security requirements of the application?
(1) is possible to code against specs for Windows, macOS and posix for the
default
file system. Knowing the exact file system allows further constraints to be
checked for.
(2) on posix is usually a check using stat() for the type of the file. On
Windows this
check is complicated by needing to know the names of all the devices and check
for them.
There are API calls that allow this list to be determined at runtime. And the
parsing rules
mean that "COM1" is an RS232 port as is "c:\windows\com1" and "com1.txt"
(3) needs a threat-model to determine that paths that are considered a security
risk.
Implementing (1) and (2) is doable.
(3) might be possible as a API that takes a list of black-listed locations to
check for.
I have code for (1) and a weak version of (2) in SCM Workbench. For (3) I have
relied
on file system permissions to prevent harm.
Windows version (MSDN documents the char set to that is allowed):
__filename_bad_chars_set = set( '\\:/\000?<>*|"' )
__filename_reserved_names = set( ['nul', 'con', 'aux', 'prn',
'com1', 'com2', 'com3', 'com4', 'com5', 'com6', 'com7', 'com8', 'com9',
'lpt1', 'lpt2', 'lpt3', 'lpt4', 'lpt5', 'lpt6', 'lpt7', 'lpt8', 'lpt9',
] )
def isInvalidFilename( filename ):
name_set = set( filename )
if len( name_set.intersection( __filename_bad_chars_set ) ) != 0:
return True
name = filename.split( '.' )[0]
if name.lower() in __filename_reserved_names:
return True
return False
macOS and Unix version (I only use Unicode input so avoid the random bytes
problems):
__filename_bad_chars_set = set( '/\000' )
def isInvalidFilename( filename ):
name_set = set( filename )
if len( name_set.intersection( __filename_bad_chars_set ) ) != 0:
return True
return False
Barry
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/CVM3BP6V6STTACXTJOAV5NADWCAONEJV/
Code of Conduct: http://python.org/psf/codeofconduct/