On Tue, Feb 2, 2021 at 8:40 PM Inada Naoki <songofaca...@gmail.com> wrote: > > On Tue, Feb 2, 2021 at 7:37 PM M.-A. Lemburg <m...@egenix.com> wrote: > > > > BTW: I don't understand this comment: > > "They are inefficient on platforms wchar_t* is UTF-16. It is because > > built-in codecs supports only UCS-1, UCS-2, and UCS-4 input." > > > > Windows is one such platform. Java (indirectly) is another. They both > > store UTF-16LE in those arrays and Python's codecs handle this just > > fine. > > > > I'm sorry about the section is not clear. > > For example, if wchar_t* is UCS4, ucs4_utf8_encoder() can encode > wchar_t* into UTF-8. > > But when wchar_t* is UTF-16, ucs2_utf8_encoder() can not handle > surrogate escape. > We need to use a temporary Unicode object. That is what "inefficient" means. > > I will update the section more elaborate. >
I updated the "Alternative Ideas" section of the PEP. https://www.python.org/dev/peps/pep-0624/#alternative-ideas They replaces `Py_UNICODE*` with `PyObject*`, `Py_UCS4*`, and `wchar_t*`. I explicitly noted that some codecs can bypass temporary Unicode objects: """ UTF-8, UTF-16, UTF-32 encoders support Py_UCS4 internally. So PyUnicode_EncodeUTF8(), PyUnicode_EncodeUTF16(), and PyUnicode_EncodeUTF32() can avoid to create a temporary Unicode object. """ -- Inada Naoki <songofaca...@gmail.com> _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AD7YKV33JAQXIXDTGUMH7UDSMQUEKVMG/ Code of Conduct: http://python.org/psf/codeofconduct/