Re: [Python-Dev] PEP 393 Summer of Code Project

Antoine Pitrou Fri, 26 Aug 2011 17:28:04 -0700

On Sat, 27 Aug 2011 12:17:18 +1200
Greg Ewing <greg.ew...@canterbury.ac.nz> wrote:
> Paul Moore wrote:
> 
> > IronPython and Jython can retain UTF-16 as their native form if that
> > makes interop cleaner, but in doing so they need to ensure that basic
> > operations like indexing and len work in terms of code points, not
> > code units, if they are to conform. ... They lose the O(1)
> > guarantee, but that's easily defensible as a tradeoff to conform to
> > underlying runtime semantics.
> 
> I would only agree as long as it wasn't too much worse
> than O(1). O(log n) might be all right, but O(n) would be
> unacceptable, I think.


It also depends a lot on *actual* measured performance. As someone
mentioned in the tracker, the index you use on a string usually comes
from a previous string operation (like a search), perhaps with a small
offset. So a caching scheme may actually give very good results with a
rather small overhead (you could cache, say, the 4 most recent indices
and choose the nearest when an indexing operation is done; with utf-8,
scanning backward and forward is equally simple).

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

Reply via email to