Marcin 'Qrczak' Kowalczyk schrieb: >> You could play tricks with ob_size to save this field: >> >> - ob_size < 0: 8-bit data; length is abs(ob_size) >> - ob_size > 0, (ob_size & 1)==0: 16-bit data, length is ob_size/2 >> - ob_size > 0, (ob_size & 1)==1: 32-bit data, length is ob_size/2 > > I wonder whether strings with characters outside ISO-8859-1 are common > enough that having a 16-bit representation is worth the trouble. > > CLISP does have it. My language doesn't.
The design of Unicode is so that all "living" scripts are encoded with the BMP. So four-byte characters would be extremely rare, and one may argue that encoding them with UTF-16 is good enough. So if there is flexibility in the internal representation of strings, I think a two-byte representation should definitely be one of the options; I'd rather debate about the necessity of one-byte and four-byte representations. Regards, Martin _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com