"Rhamphoryncus" <[EMAIL PROTECTED]> writes: > Indexing cost, memory efficiency, and canonical representation: pick > two. You can't use a canonical representation (scalar values) without > some sort of costly search when indexing (O(log n) probably) or by > expanding to the worst-case size (UTF-32). Python has taken the > approach of always providing efficient indexing (O(1)), but you can > compile it with either UTF-16 (better memory efficiency) or UTF-32 > (canonical representation).
I still don't get it. UTF-16 is just a data compression scheme, right? I mean, s[17] isn't the 17th character of the (unicode) string regardless of which memory byte it happens to live at? It could be that that accessing it takes more than constant time, but that's hidden by the implementation. So where does the invariant c==s[s.index(c)] fail, assuming s contains c? -- http://mail.python.org/mailman/listinfo/python-list