Re: [Python-3000] How will unicode get used?

Josiah Carlson Sun, 24 Sep 2006 14:42:32 -0700

"Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Josiah Carlson schrieb:
> > For me, having recently remembered what was in a unicode string, and
> > verifying it by checking the source, the question in my mind is whether
> > we want to stick with the same 2-representation implementation (default
> > encoding and UTF-16 or UCS-4 depending on build), or go with more or
> > fewer representations.
> 
> I would personally like to see a Python API that operates on code
> points, with support for 17 planes. I also think that efficient indexing
> is important.


Fully-featured unicode would be nice.


> There are trade-offs, of course. I personally think the best trade-off
> would be to have a two-byte representation, along with a flag telling
> whether there are any surrogate pairs in the string. Indexing and
> length would be constant-time if there are no surrogates, and linear
> time if there are.

What about a tree structure over the top of the string as I described in
another post?  If there are no surrogate pairs, the pointer to the tree
is null.  If there are surrogate pairs, we could either use the
structure as I described, or even modify it so that we get even better
memory utilization/performance (choose tree nodes based on where
surrogate pairs are, up to some limit).

 - Josiah

_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] How will unicode get used?

Reply via email to