Re: [Python-3000] characters data type

Fredrik Lundh Tue, 02 May 2006 23:50:12 -0700

Guido van Rossum wrote:

> Note that UTF-8 would make the implementation of Python's typical
> string API painful; we currently assume (because it's true ;-) that
> random access to elements and slices (__getitem__ and __getslice__) is
> O(1). With UTF-8 these operations would be slow -- the simplest
> implementation would require counting characters from the start; one
> can speed this up with some kind of cache or tree but IMO the
> array-of-fixed-width-characters approach is much simpler. (I had a bad
> experience in my youth with strings implemented as trees, so I'm
> biased against complicated string implementations.


I'm still thinking that it might be a good idea to (optionally) delay de-
coding of strings until you're actually doing something that needs access
to the individual characters, though.  (UTF-8 to UTF-8 shuffling is an
increasingly common use case).

(frankly, I wouldn't rule out using an "internally polymorphic" representation
for the new str type, partially motivated by my experiences from cElement-
Tree).

> This also explains why I'm no fan of the oft-proposed idea that slices
> should avoid making physical copies even if they make logical copies --
> the complexity of that approach horrifies me.)

that could also be an optional mechanism for advanced users, but I agree
that it needs a simple implementation.

I think some experimentation is required here (and hope to find some time
for that in a not very distant future).

</F>



_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] characters data type

Reply via email to