On 9/14/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
>
> "Bob Ippolito" <[EMAIL PROTECTED]> wrote:
> > The argument for UTF-8 is probably interop efficiency. Lots of C
> > libraries, file formats, and wire protocols use UTF-8 for interchange.
> > Verifying the validity of UTF-8 during string creation isn't that big
> > of a deal.
>
> Indeed, UTF-8 validation/creation isn't a big deal.  But that wasn't my
> concern.  My concern was Python-only operation efficiency, for which a
> fixed-length-per-character encoding generally wins (at least for
> operations involving two strings with the same internal encoding).

If you need to know the number of characters often you can calculate
that when the string's contents are validated. Slice ops may become
slower though... but versus UCS-4 the memory and memory bandwidth
savings might actually be a net performance win overall for many
applications.

-bob
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to