[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames
STINNER Victor victor.stin...@haypocalc.com added the comment: I fail to see why removing incorrect file names from the result list is any better than keeping them. The result list will be incorrect either way. It depends if you focus on displaying the content of the directory, or on processing files and directories. If you focus on display, yes, missing files an be seen as a bug. But if you walk into directories (use cases: os.walk(), replace a text pattern in all files (~os.glob), ...), and the function raises an error (because a directory or a file name is invalid) is worse. I mean the user have to rename all unencodable names, or the devfeloper have to patch its application to catch IOError and ignore the specific IOError(22). If listdir() ignores unencodable names, os.walk() doesn't fail, but it misses some subdirectories and files. -- Another (worse?) idea: deny bytes path for os.listdir(), but I suppose that we will not like the idea ;-) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames
STINNER Victor victor.stin...@haypocalc.com added the comment: I think trying to emulate, in Python, what the *A functions do is futile. My problem is that some functions will use mbcs in strict mode (functions using PyUnicode_EncodeFSDefault): raise UnicodeEncodeError, and other will use mbcs in replace mode (functions using Windows functions in ANSI mode): raise IOError (or other error depending on the function). It's inconsistent. We should try to keep the same behaviour for all functions. Examples of functions using (indirectly) PyUnicode_EncodeFSDefault to encode unicode filenames: bz2.BZ2File() and _ssl.SSLContext.load_cert_chain(). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames
R. David Murray rdmur...@bitdance.com added the comment: After the decision to ignore undecodable file names in os.listdir but before PEP 383 there was a long discussion on python-dev (in which I was a participant) about how horrible just ignoring the undecodable filenames was. This applies *especially* to the os.walk case, where some files would be mysteriously skipped and it wouldn't be obvious why. Or even obvious that they'd been skipped, in some cases. The biggest issue was that the developer would likely never see the problem since the bulk of developers don't encounter encoding issues, so it would be the poor end user who would be confronted with the mystery, with no clues as to the cause or solution. The conclusion of that particular thread was that Guido approved adding warning messages for filenames that were undecodable, but otherwise leaving os.listdir unchanged. Fortunately Martin came up with PEP 383, which solved the underlying problem in a better way. So, I don't think that skipping the undecodable names is good, unless you generate a warning. In that thread I started out advocating raising an error, but in this case as Martin points out that would be a backward compatibility issue. Returning the munged filenames and having the error show up when the broken filename is used seems OK to me, even if imperfect. When the user sees the problem, they report it to the developer as a bug, who hopefully changes his code to use strings. Adding warning messages would probably be useless at best and annoying at worst on Windows. Maybe we could add a pseudo deprecation warning (ie: aimed at developers, silent by default) that says don't use listdir with bytes on windows? -- nosy: +r.david.murray ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames
R. David Murray rdmur...@bitdance.com added the comment: But in the case of BZ2File and ssl.SSLContext.load_cert_chain(), isn't it the case that they are trying to open the files? So producing an early error about the decoding problem makes sense. Are there any functions other than listdir where the decoded filenames are not necessarily immediately used to manipulate the files? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames
STINNER Victor victor.stin...@haypocalc.com added the comment: FindFirst/NextFileA will also do some other interesting conversions, such as the best-fit conversion (which the mbcs code doesn't do (anymore?)). About mbcs, mbcs codec of Python 3.1 is like .encode('mbcs', 'replace') and .decode('mbcs', 'ignore') of Python 3.2 (see issue #850997). By default (strict error handler), it now raises errors on undecodable byte sequence and unencodable character, whereas Python 3.1 just ignores the error handler. PyUnicode_EncodeFSDefault / PyUnicode_DecodeFSDefault uses the strict error handler. I just added a note about mbcs in Doc/whatsnew/3.2.rst: r84750. -- title: Windows : os.listdir(b'.') doesn't raise an error forunencodable filenames - Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames
STINNER Victor victor.stin...@haypocalc.com added the comment: It remembers me the discussion of the issue #3187. About unencodable filenames, Guido proposed to ignore them or to use errors=replace, and wrote Failing the entire os.listdir() call is not acceptable. (... long discussion ...) And finally, os.listdir() ignored undecodable filenames on UNIX/BSD. Then you introduced the genious PEP 383 (utf8b then renamed surrogateescape) and os.listdir() now raises an error if the PyUnicode_FromEncodedObject(v, Py_FileSystemDefaultEncoding, surrogateescape) fails... which doesn't occur because of undecodable byte sequence, but for other reasons like a memory error. About Windows, os.listdir(str) never fails, but my question is about os.listdir(bytes). Should os.listdir(bytes) returns invalid filenames (encoded with mbcs+replace, filenames not usable to open, rename or delete the file) or just ignore them? Ok. Then I'm -1 on the patch: you can't know whether the application actually wants to open the file. Perhaps it only wants to display the file names, or perhaps it only wants to open some of the files, or only traverse into subdirectories. For backwards compatibility, I recommend to leave things as they are. FindFirst/NextFileA will also do some other interesting conversions, such as the best-fit conversion (which the mbcs code doesn't do (anymore?)). it only wants to open some of the files is the typical reason for which I hate Python2 and its implicit conversion between bytes and characters: it works in most cases, but it fails sometimes. The problem is to define (and explain) sometimes. The typical use case of listing a directory is a file chooser. On Windows using the bytes API, it works in most cases, but it fails if the user picks the wrong file (name with ?). That's the problem I would like to address. -- Ignore unencodable filenames solution is compatible with the traverse into subdirectories case. And it does also keep backward compatibility (except that unencodable files are hidden, which is a least problem I think). -- I proposed to raise an error on unencodable filename. I changed my mind after reading your answer and the discussion on #3187. My patch breaks compatibility and users don't bother to unencodable filenames. Eg. glob(*.mp3) should not fail if the directory contains a temporary unencodable filename (xxx.tmp). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames
STINNER Victor victor.stin...@haypocalc.com added the comment: FindFirst/NextFileA will also do some other interesting conversions, such as the best-fit conversion (which the mbcs code doesn't do (anymore?)). If we choose to keep this behaviour, I will have to revert my commit on mbcs codec to be consistent with os.listdir(). Or at least patch PyUnicode_EncodeFSDefault and os.fsencode() (use replace error handler) and PyUnicode_DecodeFSDefault and os.fsdecode() (use igrore error handler). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames
Martin v. Löwis mar...@v.loewis.de added the comment: About Windows, os.listdir(str) never fails, but my question is about os.listdir(bytes). Should os.listdir(bytes) returns invalid filenames (encoded with mbcs+replace, filenames not usable to open, rename or delete the file) or just ignore them? I see nothing wrong with returning incorrect file names. it only wants to open some of the files is the typical reason for which I hate Python2 and its implicit conversion between bytes and characters: it works in most cases, but it fails sometimes. The problem is to define (and explain) sometimes. Notice that this doesn't change with the patch. It will *still* work sometimes, and fail sometimes. In fact, for most users and most applications, it will never fail - *even with your patch applied*. Ignore unencodable filenames solution is compatible with the traverse into subdirectories case. And it does also keep backward compatibility (except that unencodable files are hidden, which is a least problem I think). I fail to see why removing incorrect file names from the result list is any better than keeping them. The result list will be incorrect either way. In one case (files skipped), the user will not see the file in the selection dialog, even though he knows its there and explorer shows it just fine. So he thinks there must be a bug. In the other case, it displays a non-sensical file name. Again, the user thinks there is a bug - plus if you click on the file, you get some error message (hopefully, the application will catch the exception - the directory may also have changed in-between, so a missing file error must be recovered from). So it's a user-visible bug in either case, but if the incorrect file name is included, it's slightly more obvious that something is wrong. -- title: Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames - Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames
Martin v. Löwis mar...@v.loewis.de added the comment: If we choose to keep this behaviour, I will have to revert my commit on mbcs codec to be consistent with os.listdir(). Or at least patch PyUnicode_EncodeFSDefault and os.fsencode() (use replace error handler) and PyUnicode_DecodeFSDefault and os.fsdecode() (use igrore error handler). I think trying to emulate, in Python, what the *A functions do is futile. IIUC, disables WC_NO_BEST_FIT_CHARS, and may do other stuff which apparently is undocumented. However, I fail to see the relationship to this issue. Having the MBCS codec support strict mode is a good thing. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com