Re: Micro Python -- a lean and efficient implementation of Python 3

Terry Reedy Wed, 04 Jun 2014 00:02:26 -0700

On 6/4/2014 1:55 AM, Ian Kelly wrote:

On Jun 3, 2014 11:27 PM, "Steven D'Aprano" <[email protected]
<mailto:[email protected]>> wrote:
 > For technical reasons which I don't fully understand, Unicode only
 > uses 21 of those 32 bits, giving a total of 1114112 available code
 > points.

I think mainly it's to accommodate UTF-16. The surrogate pair scheme is
sufficient to encode up to 16 supplementary planes, so if Unicode were
allowed to grow any larger than that, UTF-16 would no longer be able to
encode all codepoints.

I believe the original utf-8 used up to 6 bytes per char to encode 2**32potential chars. Just 4 bytes limits to 2**21 and for whatever reason(easier decoding?), utf-8 was revised down (unusual ;-).


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: Micro Python -- a lean and efficient implementation of Python 3

Reply via email to