On 7/2/20 10:19 AM, Victor Stinner wrote: > Do you mean UTF-16 and UTF-32? UTF-16 supports the whole Unicode > character set but uses the annoying surrogate pairs for characters > outside the BMP.*
Minor quibble, UTF-16 handles all of the CURRENTLY defined Unicode set, and there is a currently a promise not to extend Unicode past that, but at some point they may need to break that promise. UTF-8, as previously defined (and could be again) easily handles U+00000000 to U+7FFFFFFF. UTF-16 can handle via the surrogate pairs U+00000000 to U+0010FFFF and stop there, To extend past that would require some form of heroics, which is the reason that U+0010FFFF is currently defined as the highest possible code point, as to allow a higher value breaks UTF-16, and there currently isn't a desire to do so. At some point in the distant future, we may run out of 'valid' code points, and this promise will need to be broken. UTF-16 grew out of a need to fix what has become UCS-2, which is the encoding used for earlier Unicode standards, before the need for code points above U+0000FFFF (now the BMP) was seen. -- Richard Damon _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HJ7R5Q25EVCSBS7CZFZ5CNYITXOLWWFG/ Code of Conduct: http://python.org/psf/codeofconduct/