On 5/26/22, Christopher Barker <python...@gmail.com> wrote:
> IIRC, there were two builds- 16 and 32 bit Unicode. But it wasn’t UTF16, it
> was UCS-2.

In the old implementation prior to 3.3, narrow and wide builds were
supported regardless of the size of wchar_t. For a narrow build, if
wchar_t was 32-bit, then PyUnicode_FromWideChar() would encode non-BMP
ordinals as UTF-16 surrogate pairs, and PyUnicode_AsWideChar()
implemented the reverse, from UTF-16 back to UTF-32. There were
several similar cases, such as PyUnicode_FromOrdinal().

The header called this "limited" UTF-16 support, primarily I suppose
because the length of strings and indexing failed to account for
surrogate pairs. For example:

    >>> s = '\U00010000'
    >>> len(s)
    2
    >>> s[0]
    '\ud800'
    >>> s[1]
    '\udc00'

Here's a link to the old implementation:

https://github.com/python/cpython/blob/v3.2.6/Objects/unicodeobject.c
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ATPNS7CEQUONIWDXFCQEEUUGJBOJV72L/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to