Re: [Python-3000] string C API

Martin v. Löwis Sat, 16 Sep 2006 06:49:36 -0700

Nick Coghlan schrieb:
> The choice of latin-1 is deliberate and non-arbitrary. The reason for the 
> choice is that the ordinals 0-255 in latin-1 map to the Unicode code points 
> 0-255:


That's true, but that this makes a good choice for a special case
doesn't follow. Instead, frequency of occurrence of the special case
makes it a good choice.

> In effect, when creating the string, you would be doing something like this:
> 
>    if encoding == 'latin-1':
>        bytes_per_char = 1
>        code_points = 8_bit_data
>    else:
>        code_points, max_code_point = decode_to_UCS4(8_bit_data, encoding)
>        if max_code_point < 256:
>            bytes_per_char = 1
>        elif max_code_point < 65536:
>            bytes_per_char = 2
>        else:
>            bytes_per_char = 4

Hardly. Instead, the codec would have to create the string of the right
width; a codec written in C would make two passes, rather than
temporarily allocating memory to actually represent the UCS-4 codes.

Regards,
Martin
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] string C API

Reply via email to