Re: [Python-3000] string C API

Martin v. Löwis Sat, 16 Sep 2006 06:55:57 -0700

Marcin 'Qrczak' Kowalczyk schrieb:
>> You could play tricks with ob_size to save this field:
>>
>> - ob_size < 0: 8-bit data; length is abs(ob_size)
>> - ob_size > 0, (ob_size & 1)==0: 16-bit data, length is ob_size/2
>> - ob_size > 0, (ob_size & 1)==1: 32-bit data, length is ob_size/2
> 
> I wonder whether strings with characters outside ISO-8859-1 are common
> enough that having a 16-bit representation is worth the trouble.
> 
> CLISP does have it. My language doesn't.


The design of Unicode is so that all "living" scripts are encoded with
the BMP. So four-byte characters would be extremely rare, and one may
argue that encoding them with UTF-16 is good enough.

So if there is flexibility in the internal representation of strings,
I think a two-byte representation should definitely be one of the
options; I'd rather debate about the necessity of one-byte and
four-byte representations.

Regards,
Martin

_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] string C API

Reply via email to