Chris Angelico <[email protected]> writes:
> And of course, taking the *entire* rest of the string isn't the only
> thing you do. What if you want to take the next six characters after
> that index? That would be constant time with a fixed-width storage
> format.
How often is this an issue in practice?
I wonder how other languages deal with this. The examples I can think
of are poor role models:
1. C/C++ - unicode impaired, other than a wchar type
2. Java - bogus UCS-2-like(?) representation for historical reasons
Also has some modified UTF=8 for reasons that made no sense and
that I don't remember
3. Haskell - basic string type is a linked list of code points.
"hello" is five list nodes. New Data.Text library (much more
efficient) uses something like ropes, I think, with UTF-16 underneath.
4. Erlang - I think like Haskell. Efficiently handles byte blocks.
5. Perl 6 -- ???
6. Ruby - ??? (but probably quite slow like the rest of Ruby)
7. Objective C -- ???
8, 9 ... (any other important ones?)
--
http://mail.python.org/mailman/listinfo/python-list