On Wed, Jun 04, 2014 at 01:14:04PM +0000, Steve Dower wrote: > I'm agree with Daniel. Directly indexing into text suggests an > attempted optimization that is likely to be incorrect for a set of > strings.
I'm afraid I don't understand this argument. The language semantics says that a string is an array of code points. Every index relates to a single code point, no code point extends over two or more indexes. There's a 1:1 relationship between code points and indexes. How is direct indexing "likely to be incorrect"? e.g. s = "---ÿ---" offset = s.index('ÿ') assert s[offset] == 'ÿ' That cannot fail with Python's semantics. [Aside: it does fail in Python 2, showing that the idea that "strings are bytes" is fatally broken. Fortunately Python has moved beyond that.] > Splitting, regex, concatenation and formatting are really the > main operations that matter, and MicroPython can optimize their > implementation of these easily enough for O(N) indexing. Really? Well, it will be a nice experiment. Fortunately MicroPython runs under Linux as well as on embedded systems (a clever decision, by the way) so I look forward to seeing how their internal-utf8 implementation stacks up against CPython's FSR implementation. Out of curiosity, when the FSR was proposed, did anyone consider an internal UTF-8 representation? If so, why was it rejected? -- Steven _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com