On 6/7/18 9:17 PM, Steven D'Aprano wrote: > On Thu, 07 Jun 2018 15:38:39 -0400, Dennis Lee Bieber wrote: > >> On Fri, 1 Jun 2018 23:16:32 +0000 (UTC), Steven D'Aprano >> <steve+comp.lang.pyt...@pearwood.info> declaimed the following: >> >>> It should either return False, or raise TypeError. Of the two, since >>> 3.14159 cannot represent a file on any known OS, TypeError would be more >>> appropriate. >>> >> I wouldn't be so sure of that... > I would. > > There is no existing file system which uses floats instead of byte- or > character-strings for file names. If you believe different, please name > the file > > >> Xerox CP/V allowed for embedding >> non-printable characters into file names > Just like most modern file systems. > > Even FAT-16 supports a range of non-ASCII bytes with the high-bit set > (although not the control codes with the high-bit cleared). Unix file > systems typically support any byte except \0 and /. Most modern file > systems outside of Unix support any Unicode character (or almost any) > including ASCII control characters. > > https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits > > > This does bring up an interesting point. Since the Unix file system really has file names that are collection of bytes instead of really being strings, and the Python API to it want to treat them as strings, then we have an issue that we are going to be stuck with problems with filenames. If we assume they are utf-8 encoded, then there exist filenames that will trap with invalid encodingsĀ (if for example the name were generated on a system that was using Latin-1 as an 8 bit character set for file names). On the other hand, if we treat the file names as 8 bit characters by themselves, if the system was using utf-8 then we are mangling any characters outside the basic ASCII set. Basically we hit to old problem of confusing bytes and strings. Ultimately we have a fundamental limitation with trying to abstract out the format of filenames in the API, and we need a back door to allow us to define what encoding to use for filenames (and be able to detect that it doesn't work for a given file, and change it on the fly to try again), or we need an alternate API that lets us pass raw bytes as file names and the program needs to know how to handle the raw filename for that particular file system.
-- Richard Damon -- https://mail.python.org/mailman/listinfo/python-list