On Tue, Feb 2, 2021 at 11:47 PM Inada Naoki <songofaca...@gmail.com> wrote: > So if we support add UTF-16 support to ucs2_utf8_encoder(), it means > we need to add code and maintain only for PyUnicode_EncodeUTF8 (encode > from wchar_t* into char*). > > I don't think it is a good deal. As described in the PEP, encoder APIs > are used very rarely. > We must not add any maintainece costs for them.
I fixed tons of bugs related in Python 2.7 and Python 3 codecs before PEP 393 (compact strings) to handle properly 16-bit wchar_t: to handle properly surrogate characters. The implementation was complex and slow. I would prefer to not move backwards to that :-( If you are curious, look into PyUnicode_FromWideChar() implementation, search for find_maxchar_surrogates(), to have an idea of the cost of handling UTF-16 surrogate pairs. For a full codec, it's way more complex, painful to write and to maintain. I'm happy that we were able to remove that thanks to the PEP 393! Victor -- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OAPVKJAU6QZCMEWRQSYEDTGO6VAO5ZAN/ Code of Conduct: http://python.org/psf/codeofconduct/