> On 10 May 2020, at 01:34, Steve Jorgensen <ste...@stevej.name> wrote:
> 
> I believe the Python standard library should include a means of sanitizing a 
> filesystem entry, and this should not be something requiring a 3rd party 
> package.

snip

I found that I needed to have code that could tell me if a filename was valid 
for the OS I'm on.
I'm not sure where sanitising would be useful, if in valid I ask the user to 
fix with suitable feedback
in my UI.

There is more than one problem to address.

1. Is the string valid as the path to a filename on this OS and a particular 
file system?
2. Does this valid path refer to a device and not a file?
3. Is this path meeting the security requirements of the application?

(1) is possible to code against specs for Windows, macOS and posix for the 
default
file system. Knowing the exact file system allows further constraints to be 
checked for.

(2) on posix is usually a check using stat() for the type of the file. On 
Windows this
check is complicated by needing to know the names of all the devices and check 
for them.
There are API calls that allow this list to be determined at runtime. And the 
parsing rules
mean that "COM1" is an RS232 port as is "c:\windows\com1" and "com1.txt"

(3) needs a threat-model to determine that paths that are considered a security 
risk.

Implementing (1) and (2) is doable.

(3) might be possible as a API that takes a list of black-listed locations to 
check for.

I have code for (1) and a weak version of (2) in SCM Workbench. For (3) I have 
relied
on file system permissions to prevent harm.

Windows version (MSDN documents the char set to that is allowed):

__filename_bad_chars_set = set( '\\:/\000?<>*|"' )
__filename_reserved_names = set( ['nul', 'con', 'aux', 'prn',
    'com1', 'com2', 'com3', 'com4', 'com5', 'com6', 'com7', 'com8', 'com9',
    'lpt1', 'lpt2', 'lpt3', 'lpt4', 'lpt5', 'lpt6', 'lpt7', 'lpt8', 'lpt9',
    ] )

def isInvalidFilename( filename ):
    name_set = set( filename )

    if len( name_set.intersection( __filename_bad_chars_set ) ) != 0:
        return True

    name = filename.split( '.' )[0]
    if name.lower() in __filename_reserved_names:
        return True

    return False

macOS and Unix version (I only use Unicode input so avoid the random bytes 
problems):

__filename_bad_chars_set = set( '/\000' )
def isInvalidFilename( filename ):
    name_set = set( filename )

    if len( name_set.intersection( __filename_bad_chars_set ) ) != 0:
        return True

    return False


Barry
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CVM3BP6V6STTACXTJOAV5NADWCAONEJV/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to