On 07/13/2013 10:00 PM, Harry Bock wrote:
Hi all,
My name is Harry Bock. I'm interested in helping out porting Twisted
to Python 3, and I've popped in IRC a few times to introduce myself
and ask a few questions. A few developers agreed that working on trial
dependencies would be a big help.
In doing some porting work on trial, I stumbled upon a previous
porting effort (possibly by Itamar?) for twisted.python.filepath and
related modules. It seemed like the porting effort included forcing
all pathname inputs to be byte strings instead of native strings.
You imply that this was a change, somehow, but it wasn't. The API was
*always* bytes and it continues to be bytes on Python 3.
It's a common Python 3 porting mistake to change everything from bytes
to unicode just because. E.g. Python standard library does this in many
places for no good reason, resulting in bugs that are still being fixed
(http://bugs.python.org/issue12411) or APIs that are less useful
(zipfile docs explicitly state that there is no standard encoding in zip
files, but Python 3 zipfile module only supports one specific encoding
because they switched to Unicode and didn't bother reading the module's
own docs). Our goal in porting was backwards compatibility with Python 2
code, so porters don't have to change everything, and correctness. And,
in this particular case, to get something working in the minimal amount
of time - *adding* Unicode support is useful and should be done.
After some investigation, I believe this is the wrong approach, but I
wanted to start a discussion here first. Some thoughts:
(a) As of Python 3.3, use of the ANSI API in Windows is deprecated[1],
so many functions in os and os.path raise DeprecationWarning when
given byte strings as input. Although win32 is not an initial target
of the porting effort, we'll have to support it eventually and the API
should be supported before then.
(b) Misunderstandings at the application level about the underlying
filesystem's path encoding is not the problem of the Twisted API.
Correct me if I'm wrong, but that's the responsibility of the system
administrator or individual user (at least on UNIX) to set the LANG
environment variable, or for the application to call setlocale(3) to
explicitly override it.
Given operating systems that don't really know about encodings on the
filesystem level, forcing everything to be unicode doesn't make sense.
I'm pretty sure you can end up with files in multiple different Unicode
encodings on same filesystem on Linux, for example.
(c) If we do not allow unicode strings, we will be forcing the
application developer to decide how to encode paths when using the
FilePath API. Per (b) above, the user will have to call
sys.getfilesystemencoding()[2] to divine what encoding to use before
using the API at all, which to me is terribly annoying and would just
add str.encode calls everywhere.
It is indeed a problem that we only support bytes in FilePath on Python
3. As I mentioned above, Unicode support is missing only due to lack of
time in the initial port.
Thus, my vote is that on Python 2.x, Twisted should accept either the
native str or unicode types for path names, and on Python 3.x, only
accept the str type to prevent deprecation issues with system calls.
I have a patch set that will make this happen including unittest
modifications; if there's a consensus I'm happy to open a ticket and
submit the patches.
The ideal situation would be to support bytes and Unicode on Python 2
*and* Python 3, for maximum compatibility. Even if deprecated on
Windows, filesystem operations on Python 3 still do accept bytes (and
they're not deprecated elsewhere). Given existing code that already
takes bytes, switching to only doing Unicode on Python 3 would not be
backwards compatible, so we can't really do that without a bunch of
deprecation warnings and a few releases. Instead we should just do what
Python does: if you start with bytes path you always get back bytes, if
you start with Unicode path you always get back Unicode.
_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python