[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames

2010-09-13 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 I fail to see why removing incorrect file names from the result
 list is any better than keeping them. The result list will 
 be incorrect either way.

It depends if you focus on displaying the content of the directory, or on 
processing files and directories. If you focus on display, yes, missing files 
an be seen as a bug. But if you walk into directories (use cases: os.walk(), 
replace a text pattern in all files (~os.glob), ...), and the function raises 
an error (because a directory or a file name is invalid) is worse. I mean the 
user have to rename all unencodable names, or the devfeloper have to patch its 
application to catch IOError and ignore the specific IOError(22).

If listdir() ignores unencodable names, os.walk() doesn't fail, but it misses 
some subdirectories and files.

--

Another (worse?) idea: deny bytes path for os.listdir(), but I suppose that we 
will not like the idea ;-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames

2010-09-13 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 I think trying to emulate, in Python, what the *A functions 
 do is futile.

My problem is that some functions will use mbcs in strict mode (functions using 
PyUnicode_EncodeFSDefault): raise UnicodeEncodeError, and other will use mbcs 
in replace mode (functions using Windows functions in ANSI mode): raise IOError 
(or other error depending on the function). It's inconsistent. We should try to 
keep the same behaviour for all functions.

Examples of functions using (indirectly) PyUnicode_EncodeFSDefault to encode 
unicode filenames: bz2.BZ2File() and _ssl.SSLContext.load_cert_chain().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames

2010-09-13 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

After the decision to ignore undecodable file names in os.listdir but before 
PEP 383 there was a long discussion on python-dev (in which I was a 
participant) about how horrible just ignoring the undecodable filenames was.  
This applies *especially* to the os.walk case, where some files would be 
mysteriously skipped and it wouldn't be obvious why.
Or even obvious that they'd been skipped, in some cases.  The biggest issue was 
that the developer would likely never see the problem since the bulk of 
developers don't encounter encoding issues, so it would be the poor end user 
who would be confronted with the mystery, with no clues as to the cause or 
solution.

The conclusion of that particular thread was that Guido approved adding warning 
messages for filenames that were undecodable, but otherwise leaving os.listdir 
unchanged.  Fortunately Martin came up with PEP 383, which solved the 
underlying problem in a better way.

So, I don't think that skipping the undecodable names is good, unless you 
generate a warning.  In that thread I started out advocating raising an error, 
but in this case as Martin points out that would be a backward compatibility 
issue.  Returning the munged filenames and having the error show up when the 
broken filename is used seems OK to me, even if imperfect.  When the user sees 
the problem, they report it to the developer as a bug, who hopefully changes 
his code to use strings.

Adding warning messages would probably be useless at best and annoying at worst 
on Windows.  Maybe we could add a pseudo deprecation warning (ie: aimed at 
developers, silent by default) that says don't use listdir with bytes on 
windows?

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames

2010-09-13 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

But in the case of BZ2File and ssl.SSLContext.load_cert_chain(), isn't it the 
case that they are trying to open the files?  So producing an early error about 
the decoding problem makes sense.  Are there any functions other than listdir 
where the decoded filenames are not necessarily immediately used to manipulate 
the files?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames

2010-09-12 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 FindFirst/NextFileA will also do some other interesting conversions,
 such as the best-fit conversion (which the mbcs code doesn't do
 (anymore?)).

About mbcs, mbcs codec of Python 3.1 is like .encode('mbcs', 'replace') and 
.decode('mbcs', 'ignore') of Python 3.2 (see issue #850997). By default 
(strict error handler), it now raises errors on undecodable byte sequence and 
unencodable character, whereas Python 3.1 just ignores the error handler.

PyUnicode_EncodeFSDefault / PyUnicode_DecodeFSDefault uses the strict error 
handler.

I just added a note about mbcs in Doc/whatsnew/3.2.rst: r84750.

--
title: Windows : os.listdir(b'.') doesn't raise an error forunencodable 
filenames - Windows : os.listdir(b'.') doesn't raise an errorfor   
unencodablefilenames

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames

2010-09-12 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

It remembers me the discussion of the issue #3187. About unencodable filenames, 
Guido proposed to ignore them or to use errors=replace, and wrote Failing 
the entire os.listdir() call is not acceptable. (... long discussion ...) And 
finally, os.listdir() ignored undecodable filenames on UNIX/BSD.

Then you introduced the genious PEP 383 (utf8b then renamed surrogateescape) 
and os.listdir() now raises an error if the PyUnicode_FromEncodedObject(v, 
Py_FileSystemDefaultEncoding, surrogateescape) fails... which doesn't occur 
because of undecodable byte sequence, but for other reasons like a memory 
error.

About Windows, os.listdir(str) never fails, but my question is about 
os.listdir(bytes). Should os.listdir(bytes) returns invalid filenames (encoded 
with mbcs+replace, filenames not usable to open, rename or delete the file) 
or 
just ignore them?

 Ok. Then I'm -1 on the patch: you can't know whether the application
 actually wants to open the file. Perhaps it only wants to display the
 file names, or perhaps it only wants to open some of the files, or
 only traverse into subdirectories.

 For backwards compatibility, I recommend to leave things as they are.
 FindFirst/NextFileA will also do some other interesting conversions,
 such as the best-fit conversion (which the mbcs code doesn't do
 (anymore?)).

it only wants to open some of the files is the typical reason for which I 
hate Python2 and its implicit conversion between bytes and characters: it 
works in most cases, but it fails sometimes. The problem is to define (and 
explain) sometimes.

The typical use case of listing a directory is a file chooser. On Windows using 
the bytes API, it works in most cases, but it fails if the user picks the 
wrong file (name with ?). That's the problem I would like to address.

--

Ignore unencodable filenames solution is compatible with the traverse into 
subdirectories case. And it does also keep backward compatibility (except 
that unencodable files are hidden, which is a least problem I think).

--

I proposed to raise an error on unencodable filename. I changed my mind after 
reading your answer and the discussion on #3187. My patch breaks compatibility 
and users don't bother to unencodable filenames. Eg. glob(*.mp3) should not 
fail if the directory contains a temporary unencodable filename (xxx.tmp).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames

2010-09-12 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 FindFirst/NextFileA will also do some other interesting conversions,
 such as the best-fit conversion (which the mbcs code doesn't do
 (anymore?)).

If we choose to keep this behaviour, I will have to revert my commit on mbcs 
codec to be consistent with os.listdir(). Or at least patch 
PyUnicode_EncodeFSDefault and os.fsencode() (use replace error handler) and 
PyUnicode_DecodeFSDefault and os.fsdecode() (use igrore error handler).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames

2010-09-12 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 About Windows, os.listdir(str) never fails, but my question is about 
 os.listdir(bytes). Should os.listdir(bytes) returns invalid filenames 
 (encoded 
 with mbcs+replace, filenames not usable to open, rename or delete the file) 
 or 
 just ignore them?

I see nothing wrong with returning incorrect file names.

 it only wants to open some of the files is the typical reason for which I 
 hate Python2 and its implicit conversion between bytes and characters: it 
 works in most cases, but it fails sometimes. The problem is to define (and 
 explain) sometimes.

Notice that this doesn't change with the patch. It will *still* work
sometimes, and fail sometimes. In fact, for most users and most
applications, it will never fail - *even with your patch applied*.

 Ignore unencodable filenames solution is compatible with the traverse into 
 subdirectories case. And it does also keep backward compatibility (except 
 that unencodable files are hidden, which is a least problem I think).

I fail to see why removing incorrect file names from the result list is
any better than keeping them. The result list will be incorrect either way.

In one case (files skipped), the user will not see the file in the
selection dialog, even though he knows its there and explorer shows it
just fine. So he thinks there must be a bug.

In the other case, it displays a non-sensical file name. Again, the user
thinks there is a bug - plus if you click on the file, you get some
error message (hopefully, the application will catch the exception -
the directory may also have changed in-between, so a missing file
error must be recovered from).

So it's a user-visible bug in either case, but if the incorrect file
name is included, it's slightly more obvious that something is wrong.

--
title: Windows : os.listdir(b'.') doesn't raise an errorfor 
unencodablefilenames - Windows : os.listdir(b'.') doesn't raise an 
errorfor unencodablefilenames

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an errorfor unencodablefilenames

2010-09-12 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 If we choose to keep this behaviour, I will have to revert my commit on mbcs 
 codec to be consistent with os.listdir(). Or at least patch 
 PyUnicode_EncodeFSDefault and os.fsencode() (use replace error handler) and 
 PyUnicode_DecodeFSDefault and os.fsdecode() (use igrore error handler).

I think trying to emulate, in Python, what the *A functions do is
futile. IIUC, disables WC_NO_BEST_FIT_CHARS, and may do other stuff
which apparently is undocumented.

However, I fail to see the relationship to this issue. Having the MBCS
codec support strict mode is a good thing.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com