David Hopwood schrieb: > On Windows, file system pathnames can contain arbitrary Unicode characters > (well, almost). Despite the existence of "ANSI" filesystem APIs, and > regardless of what 'sys.getfilesystemencoding()' returns, the underlying > file system encoding for NTFS and FAT filesystems is UTF-16LE. > > Thus, either: > - the fact that sys.getfilesystemencoding() returns a non-Unicode encoding > on Windows is a bug, or > - any program that relies on sys.getfilesystemencoding() being able to > encode arbitrary Windows pathnames has a bug. > > We need to decide which of these is the case.
There is a third option: - the operating system has a bug It is actually this option that rules out the other two. sys.getfilesystemencoding() returns "mbcs" on Windows, which means CP_ACP. The file system encoding is an encoding that converts a file name into a byte string. Unfortunately, on Windows, there are file names which cannot be converted into a byte string in a standard manner. This is an operating system bug (or mis-design; they should have chosen UTF-8 as the byte encoding of file names, instead of making it depend on the system locale, but they of course did so for backwards compatibility with Windows 3.1 and 9x). As a side note: every encoding in Python is a Unicode encoding; so there aren't any "non-Unicode encodings". Programs that rely on sys.getfilesystemencoding() being able to represent arbitrary file names on Windows might have a bug; programs that rely on sys.getfilesystemencoding() being able to encode all elements of sys.path do not (atleast not for Python 2.5 and earlier). Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com