"Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > David Hopwood schrieb: [snip] > > Should we nevertheless try to avoid making the use of Unicode strings > > unnecessarily difficult for people who have minimal knowledge of Unicode? > > Absolutely, but not at the expense of making basic operations on strings > > asymptotically less efficient. O(1) indexing and slicing is a basic > > requirement, even if it has to be done using code units. > > It's not possible to implement slicing in constant time, unless string > views are introduced. Currently, slicing takes time linear with the > length of the result string.
I believe he was referring to discovering the memory address where slicing should begin. In the case of Latin-1, UCS-2, or UCS-4, given a starting address and some position i, it is trivial to discover the memory position of character i. In the case of UTF-8, given a starting address and some position i, one needs to somewhat parse the UTF-8 representation to discover the memory position of character i. For me, having recently remembered what was in a unicode string, and verifying it by checking the source, the question in my mind is whether we want to stick with the same 2-representation implementation (default encoding and UTF-16 or UCS-4 depending on build), or go with more or fewer representations. We can reduce memory consumption by using a single representation, whether it be constant or variable based on content, though in some cases (utf-16, ucs-4) we would lose the 'native' single-segment char (C char) buffer interface. Using multiple representations, and choosing those representations carefully based on platform (always keep utf-8 as one of the representations on linux, always keep utf-16 as one of the representations in Windows), we may be able to increase platform API calling speed, if such is desireable. After re-reading the source, and thinking a bit more, about my only real concern is memory use of Python 3.x . The current implementation works, so I'm +1 on keeping it "as is", but I'm also +0 on some implementation that would reduce memory use (with limited, if any slowdown) for as many platforms as possible, not any higher because changing the underlying implementation would be a PITA. - Josiah _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
