On 7/2/20 10:19 AM, Victor Stinner wrote:
> Do you mean UTF-16 and UTF-32? UTF-16 supports the whole Unicode
> character set but uses the annoying surrogate pairs for characters
> outside the BMP.*

Minor quibble, UTF-16 handles all of the CURRENTLY defined Unicode set,
and there is a currently a promise not to extend Unicode past that, but
at some point they may need to break that promise.

UTF-8, as previously defined (and could be again) easily handles
U+00000000 to U+7FFFFFFF.

UTF-16 can handle via the surrogate pairs U+00000000 to U+0010FFFF and
stop there, To extend past that would require some form of heroics,
which is the reason that U+0010FFFF is currently defined as the highest
possible code point, as to allow a higher value breaks UTF-16, and there
currently isn't a desire to do so. At some point in the distant future,
we may run out of 'valid' code points, and this promise will need to be
broken.

UTF-16 grew out of a need to fix what has become UCS-2, which is the
encoding used for earlier Unicode standards, before the need for code
points above U+0000FFFF (now the BMP) was seen.

-- 
Richard Damon
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HJ7R5Q25EVCSBS7CZFZ5CNYITXOLWWFG/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to