Nick Coghlan schrieb: > The choice of latin-1 is deliberate and non-arbitrary. The reason for the > choice is that the ordinals 0-255 in latin-1 map to the Unicode code points > 0-255:
That's true, but that this makes a good choice for a special case doesn't follow. Instead, frequency of occurrence of the special case makes it a good choice. > In effect, when creating the string, you would be doing something like this: > > if encoding == 'latin-1': > bytes_per_char = 1 > code_points = 8_bit_data > else: > code_points, max_code_point = decode_to_UCS4(8_bit_data, encoding) > if max_code_point < 256: > bytes_per_char = 1 > elif max_code_point < 65536: > bytes_per_char = 2 > else: > bytes_per_char = 4 Hardly. Instead, the codec would have to create the string of the right width; a codec written in C would make two passes, rather than temporarily allocating memory to actually represent the UCS-4 codes. Regards, Martin _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com