Re: PEP 393 vs UTF-8 Everywhere

MRAB Fri, 20 Jan 2017 18:13:57 -0800

On 2017-01-21 00:51, Pete Forman wrote:

MRAB <[email protected]> writes:

As someone who has written an extension, I can tell you that I much
prefer dealing with a fixed number of bytes per codepoint than a
variable number of bytes per codepoint, especially as I'm also
supporting earlier versions of Python where that was the case.


At the risk of sounding harsh, if supporting variable bytes per
codepoint is a pain you should roll with it for the greater good of
supporting users.

Or I could decide not bother and leave it to someone else to continuethe project. After all, it's not like I'm not getting paid for the work,it's purely voluntary.

PEP 393 / Python 3.3 required extension writers to revisit their access
to strings. My explicit question was about why PEP 393 was adopted to
replace the deficient old implementations rather than another approach.
The implicit question is whether a UTF-8 internal representation should
replace that of PEP 393.

I already had to handle 1-byte bytestrings and 2/4-byte (narrow/wide)Unicode strings, so switching to 1/2/4 strings wasn't too bad. Switchingto a completely different, variable-width system would've been a lotmore work.


--
https://mail.python.org/mailman/listinfo/python-list

Re: PEP 393 vs UTF-8 Everywhere

Reply via email to