On 10/25/2011 4:31 AM, Victor Stinner wrote:
Le Mardi 25 Octobre 2011 09:09:56 vous avez écrit :
I propose to raise Unicode errors if a filename cannot be decoded on
Windows, instead of creating a bogus filenames with questions marks.

Can you please elaborate what APIs you are talking about exactly?

Basically, all functions processing filenames, so most functions of
posixmodule.c. Some examples:

This seems way too broad. From you previous posts, I presumed that you only propose to change behavior when the user asks for the bytes versions of a unicode name that cannot be properly converted to a bytes version.

- os.listdir():

os.listdir(unicode) works fine and should not be changed.
os.listdir(bytes) is what OP of issue wants changed.

FindFirstFileA, FindNextFileA, FindCloseA

There are not Python names. Are they Windows API names?

- os.lstat(): CreateFileA

This does not create a path and should not be changed as far as I can see.

- os.getcwdb():

This you might change.

> getcwd()

This should not be, as no bytes are involved.

- os.mkdir(): CreateDirectoryA
- os.chmod(): SetFileAttributesA

Like os.lstat, these accept only accept a path and should do what they are supposed to do.

If it's the byte APIs (i.e. using bytes as file names), then I'm -1 on
this proposal. People that explicitly use bytes for file names deserve
to get whatever exact platform semantics the platform has to offer. This
is true on Unix, and it is also true on Windows.

My proposition is a fix to user reported by a user:
http://bugs.python.org/issue13247

I want to keep the bytes API for backward compatibility, and it will still
work for non-ASCII characters, but only for non-ASCII characters encodable to
the ANSI code page.

In practice, characters not encodable to the ANSI code page are very rare. For
example: it's difficult to write such characters directly with the keyboard. I
bet that very few people will notify the change.

Actually, Windows makes switching keyboard setups rather easy once you enable the feature. It might be that people who routinely use non-'ansi' characters in file and directory names do not routinely ask for bytes versions thereof.

The doc says "All functions accepting path or file names accept both bytes and string objects, and result in an object of the same type, if a path or file name is returned." It does that now, though it says nothing about the encoding assumed for input bytes or used for output bytes. It does not mention raising exceptions, so doing so is a feature-change that would likely break code. Currently, exceptional situations are signalled with "'?' in returned_path" rather than with an exception object. It ('?') is a bad choice of signal though, given the other uses of '?' in paths.

--
Terry Jan Reedy


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to