[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames
Martin v. Löwis mar...@v.loewis.de added the comment: What do you gain with this patch? (i.e. what is its advantage?) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames
STINNER Victor victor.stin...@haypocalc.com added the comment: What do you gain with this patch? (i.e. what is its advantage?) You know directly that os.listdir(bytes) is unable to encode the filename, instead of manipulate an invalid filename (b'?') and get the error later (when you use the filename: open, copy, delete, ... the file). It's the same idea than str+bytes raises an error on Python3: get the error earlier instead of store invalid data and get the error to late. Anywy, on Windows, it's not a good idea to manipulate bytes filenames. So it's also a way to encourage people to migrate their applications to unicode on Windows. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames
Martin v. Löwis mar...@v.loewis.de added the comment: You know directly that os.listdir(bytes) is unable to encode the filename, instead of manipulate an invalid filename (b'?') and get the error later (when you use the filename: open, copy, delete, ... the file). Ok. Then I'm -1 on the patch: you can't know whether the application actually wants to open the file. Perhaps it only wants to display the file names, or perhaps it only wants to open some of the files, or only traverse into subdirectories. For backwards compatibility, I recommend to leave things as they are. FindFirst/NextFileA will also do some other interesting conversions, such as the best-fit conversion (which the mbcs code doesn't do (anymore?)). Windows has explicit A and W versions, and Python has explicit A and W types, so it's IMO best to pair them in the natural way (even if that means code duplication). Anywy, on Windows, it's not a good idea to manipulate bytes filenames. So it's also a way to encourage people to migrate their applications to unicode on Windows. Only if people run into the issue (which few people will). People which *do* run into the issue will likely get an error either way, which will teach them their lesson :-) -- title: Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames - Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames
New submission from STINNER Victor victor.stin...@haypocalc.com: In Python 3.2, mbcs encoding (default filesystem encoding on Windows) is now strict: raise an error on unencodable/undecodable characters/bytes. But os.listdir(b'.') encodes unencodable bytes as b'?'. Example: os.mkdir('listdir') open('listdir\\xxx-\u0363', 'w').close() filename = os.listdir(b'listdir')[0] filename b'xxx-?' open(filename, 'r').close() IOError: [Errno 22] Invalid argument: 'xxx-?' os.listdir(b'listdir') should raise an error (and not ignore the filename or replaces unencodable characters by b'?'). I think that we should list the directory using the wide character API (FindFirstFileW) but encode the filename using PyUnicode_EncodeFSDefault() if the directory name type is bytes, instead of using the ANSI API (FindFirstFileA). -- components: Library (Lib), Unicode, Windows messages: 115995 nosy: haypo, loewis priority: normal severity: normal status: open title: Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames versions: Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames
STINNER Victor victor.stin...@haypocalc.com added the comment: I found this bug while trying to find an unencodable filename for #9819 (TESTFN_UNDECODABLE). Anyway, the bytes API should be avoided on Windows since Windows native filename type is unicode. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames
STINNER Victor victor.stin...@haypocalc.com added the comment: os.listdir(b'listdir') should raise an error (and not ignore the filename or replaces unencodable characters by b'?'). To avoid the error, a solution is to support the PEP 383 on Windows (for the mbcs encoding). I opened a separated issue for that: #9821. But support PEP 383 will not fix this issue because the current implementation of listdir(b'.') doesn't use the Python codec, but use raw bytes filenames (use the ANSI API). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames
STINNER Victor victor.stin...@haypocalc.com added the comment: Patch: - Remove the bytes version of listdir(): reuse the unicode version but converts the filename to bytes using PyUnicode_EncodeFSDefault() if the directory name is not unicode - use Py_XDECREF(d) instead of Py_DECREF(d) at the end (because d=NULL on error) - use Py_CLEAR(d) instead of Py_DECREF(d); d=NULL; - remove char namebuf[MAX_PATH+5] buffer (use less stack memory) -- keywords: +patch Added file: http://bugs.python.org/file18836/listdir_windows_bytes.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com