On Tue, Feb 2, 2021 at 11:47 PM Inada Naoki <songofaca...@gmail.com> wrote:
> So if we support add UTF-16 support to ucs2_utf8_encoder(), it means
> we need to add code and maintain only for PyUnicode_EncodeUTF8 (encode
> from wchar_t* into char*).
>
> I don't think it is a good deal. As described in the PEP, encoder APIs
> are used very rarely.
> We must not add any maintainece costs for them.

I fixed tons of bugs related in Python 2.7 and Python 3 codecs before
PEP 393 (compact strings) to handle properly 16-bit wchar_t: to handle
properly surrogate characters. The implementation was complex and
slow. I would prefer to not move backwards to that :-(

If you are curious, look into PyUnicode_FromWideChar() implementation,
search for find_maxchar_surrogates(), to have an idea of the cost of
handling UTF-16 surrogate pairs. For a full codec, it's way more
complex, painful to write and to maintain. I'm happy that we were able
to remove that thanks to the PEP 393!

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OAPVKJAU6QZCMEWRQSYEDTGO6VAO5ZAN/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to