[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames

2010-09-12 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

What do you gain with this patch? (i.e. what is its advantage?)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames

2010-09-12 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 What do you gain with this patch? (i.e. what is its advantage?)

You know directly that os.listdir(bytes) is unable to encode the filename, 
instead of manipulate an invalid filename (b'?') and get the error later (when 
you use the filename: open, copy, delete, ... the file).

It's the same idea than str+bytes raises an error on Python3: get the error 
earlier instead of store invalid data and get the error to late.

Anywy, on Windows, it's not a good idea to manipulate bytes filenames. So it's 
also a way to encourage people to migrate their applications to unicode on 
Windows.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames

2010-09-12 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 You know directly that os.listdir(bytes) is unable to encode the
 filename, instead of manipulate an invalid filename (b'?') and get
 the error later (when you use the filename: open, copy, delete, ...
 the file).

Ok. Then I'm -1 on the patch: you can't know whether the application
actually wants to open the file. Perhaps it only wants to display the
file names, or perhaps it only wants to open some of the files, or
only traverse into subdirectories.

For backwards compatibility, I recommend to leave things as they are.
FindFirst/NextFileA will also do some other interesting conversions,
such as the best-fit conversion (which the mbcs code doesn't do
(anymore?)).

Windows has explicit A and W versions, and Python has explicit A
and W types, so it's IMO best to pair them in the natural way
(even if that means code duplication).

 Anywy, on Windows, it's not a good idea to manipulate bytes
 filenames. So it's also a way to encourage people to migrate their
 applications to unicode on Windows.

Only if people run into the issue (which few people will). People
which *do* run into the issue will likely get an error either
way, which will teach them their lesson :-)

--
title: Windows : os.listdir(b'.') doesn't raise an error for unencodable 
filenames - Windows : os.listdir(b'.') doesn't raise an error for 
unencodable filenames

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames

2010-09-10 Thread STINNER Victor

New submission from STINNER Victor victor.stin...@haypocalc.com:

In Python 3.2, mbcs encoding (default filesystem encoding on Windows) is now 
strict: raise an error on unencodable/undecodable characters/bytes. But 
os.listdir(b'.') encodes unencodable bytes as b'?'.

Example:

 os.mkdir('listdir')
 open('listdir\\xxx-\u0363', 'w').close()
 filename = os.listdir(b'listdir')[0]
 filename
b'xxx-?'
 open(filename, 'r').close()
IOError: [Errno 22] Invalid argument: 'xxx-?'

os.listdir(b'listdir') should raise an error (and not ignore the filename or 
replaces unencodable characters by b'?').

I think that we should list the directory using the wide character API 
(FindFirstFileW) but encode the filename using PyUnicode_EncodeFSDefault() if 
the directory name type is bytes, instead of using the ANSI API 
(FindFirstFileA).

--
components: Library (Lib), Unicode, Windows
messages: 115995
nosy: haypo, loewis
priority: normal
severity: normal
status: open
title: Windows : os.listdir(b'.') doesn't raise an error for unencodable 
filenames
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames

2010-09-10 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

I found this bug while trying to find an unencodable filename for #9819 
(TESTFN_UNDECODABLE).

Anyway, the bytes API should be avoided on Windows since Windows native 
filename type is unicode.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames

2010-09-10 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 os.listdir(b'listdir') should raise an error (and not ignore 
 the filename or replaces unencodable characters by b'?').

To avoid the error, a solution is to support the PEP 383 on Windows (for the 
mbcs encoding). I opened a separated issue for that: #9821.

But support PEP 383 will not fix this issue because the current implementation 
of listdir(b'.') doesn't use the Python codec, but use raw bytes filenames (use 
the ANSI API).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9820] Windows : os.listdir(b'.') doesn't raise an error for unencodable filenames

2010-09-10 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Patch:
 - Remove the bytes version of listdir(): reuse the unicode version but 
converts the filename to bytes using PyUnicode_EncodeFSDefault() if the 
directory name is not unicode
 - use Py_XDECREF(d) instead of Py_DECREF(d) at the end (because d=NULL on 
error)
 - use Py_CLEAR(d) instead of Py_DECREF(d); d=NULL;
 - remove char namebuf[MAX_PATH+5] buffer (use less stack memory)

--
keywords: +patch
Added file: http://bugs.python.org/file18836/listdir_windows_bytes.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com