Re: [Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API

Terry Reedy Tue, 25 Oct 2011 17:51:56 -0700

On 10/25/2011 4:31 AM, Victor Stinner wrote:

Le Mardi 25 Octobre 2011 09:09:56 vous avez écrit :

I propose to raise Unicode errors if a filename cannot be decoded on
Windows, instead of creating a bogus filenames with questions marks.


Can you please elaborate what APIs you are talking about exactly?


Basically, all functions processing filenames, so most functions of
posixmodule.c. Some examples:

This seems way too broad. From you previous posts, I presumed that youonly propose to change behavior when the user asks for the bytesversions of a unicode name that cannot be properly converted to a bytesversion.

- os.listdir():


os.listdir(unicode) works fine and should not be changed.
os.listdir(bytes) is what OP of issue wants changed.

FindFirstFileA, FindNextFileA, FindCloseA


There are not Python names. Are they Windows API names?

- os.lstat(): CreateFileA


This does not create a path and should not be changed as far as I can see.

- os.getcwdb():


This you might change.

> getcwd()

This should not be, as no bytes are involved.

- os.mkdir(): CreateDirectoryA
- os.chmod(): SetFileAttributesA

Like os.lstat, these accept only accept a path and should do what theyare supposed to do.

If it's the byte APIs (i.e. using bytes as file names), then I'm -1 on
this proposal. People that explicitly use bytes for file names deserve
to get whatever exact platform semantics the platform has to offer. This
is true on Unix, and it is also true on Windows.


My proposition is a fix to user reported by a user:
http://bugs.python.org/issue13247

I want to keep the bytes API for backward compatibility, and it will still
work for non-ASCII characters, but only for non-ASCII characters encodable to
the ANSI code page.

In practice, characters not encodable to the ANSI code page are very rare. For
example: it's difficult to write such characters directly with the keyboard. I
bet that very few people will notify the change.

Actually, Windows makes switching keyboard setups rather easy once youenable the feature. It might be that people who routinely use non-'ansi'characters in file and directory names do not routinely ask for bytesversions thereof.

The doc says "All functions accepting path or file names accept bothbytes and string objects, and result in an object of the same type, if apath or file name is returned." It does that now, though it says nothingabout the encoding assumed for input bytes or used for output bytes. Itdoes not mention raising exceptions, so doing so is a feature-changethat would likely break code. Currently, exceptional situations aresignalled with "'?' in returned_path" rather than with an exceptionobject. It ('?') is a bad choice of signal though, given the other usesof '?' in paths.


--
Terry Jan Reedy


_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API

Reply via email to