On 5/26/22, Christopher Barker <python...@gmail.com> wrote: > IIRC, there were two builds- 16 and 32 bit Unicode. But it wasn’t UTF16, it > was UCS-2.
In the old implementation prior to 3.3, narrow and wide builds were supported regardless of the size of wchar_t. For a narrow build, if wchar_t was 32-bit, then PyUnicode_FromWideChar() would encode non-BMP ordinals as UTF-16 surrogate pairs, and PyUnicode_AsWideChar() implemented the reverse, from UTF-16 back to UTF-32. There were several similar cases, such as PyUnicode_FromOrdinal(). The header called this "limited" UTF-16 support, primarily I suppose because the length of strings and indexing failed to account for surrogate pairs. For example: >>> s = '\U00010000' >>> len(s) 2 >>> s[0] '\ud800' >>> s[1] '\udc00' Here's a link to the old implementation: https://github.com/python/cpython/blob/v3.2.6/Objects/unicodeobject.c _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ATPNS7CEQUONIWDXFCQEEUUGJBOJV72L/ Code of Conduct: http://python.org/psf/codeofconduct/