On Sat, 27 Aug 2011 12:17:18 +1200 Greg Ewing <greg.ew...@canterbury.ac.nz> wrote: > Paul Moore wrote: > > > IronPython and Jython can retain UTF-16 as their native form if that > > makes interop cleaner, but in doing so they need to ensure that basic > > operations like indexing and len work in terms of code points, not > > code units, if they are to conform. ... They lose the O(1) > > guarantee, but that's easily defensible as a tradeoff to conform to > > underlying runtime semantics. > > I would only agree as long as it wasn't too much worse > than O(1). O(log n) might be all right, but O(n) would be > unacceptable, I think.
It also depends a lot on *actual* measured performance. As someone mentioned in the tracker, the index you use on a string usually comes from a previous string operation (like a search), perhaps with a small offset. So a caching scheme may actually give very good results with a rather small overhead (you could cache, say, the 4 most recent indices and choose the nearest when an indexing operation is done; with utf-8, scanning backward and forward is equally simple). Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com