Re: [Python-Dev] PEP 393 Summer of Code Project

Stefan Behnel Fri, 26 Aug 2011 11:16:27 -0700

Guido van Rossum, 26.08.2011 19:02:

On Fri, Aug 26, 2011 at 3:29 AM, Stefan Behnel wrote:

Besides, what if these implementations provided indexing in, say, O(log N)
instead of O(1) or O(N), e.g. by building a tree index into each string? You
could have an index that simply marks runs of surrogate pairs and BMP
substrings, thus providing a likely-to-be-somewhat-compact index. That index
would obviously have to be built, but so do the different string
representations in post-PEP-393 CPython, especially on Windows, as I have
learned.


Eek. No, please.

I was mostly just confabulating. My main point was that this isn't ablack-and-white thing - O(1) xor O(N) - and thus is orthogonal to the PEP.You can achieve compliant/acceptable behaviour at the code point level, theperformance guarantees level or the platform integration level - choose anytwo. CPython is just lucky that there isn't really a platform integrationlevel to take into account (if we leave the Windows environment aside for amoment).

Those platforms' native string types have length and
slicing operations that are O(1) and work in terms of 16-bit code
points. Python should use those. It would be awful if Java and Python
code doing the same manipulations on the same string would come to
different conclusions because Python tried to paper over surrogates.


I fully agree.

Would such a less severe violation of the strict O(1) rule still be "not
ok"? I think this is not such a clear black-and-white issue. Both
implementations have notably different performance characteristics than
CPython in some more or less important areas, as does PyPy. At some point,
the language compliance label has to account for that.


Since you had to ask, I have to declare that, indeed, non-O(1)
behavior would not be okay for those platforms.

I take it that you say that because you want strings to perform in the'normal' platform specific way here (i.e. like Java/.NET strings), and notso much because you want to require the exact same (performance)characteristics across Python implementations. So your choice is platformintegration over code points, leaving the identical performance as aside-effect of the platform integration.

All in all, I don't think we should legislate Python strings to be
able to support 21-bit code points using O(1) indexing. PEP 393 makes
this possible for CPython, and it's been said that PyPy can follow
suit. But it'll be a "quality-of-implementation" issue, not built into
the language spec.

Makes sense to me. Most likely, Unicode heavy Python code will have to takeplatform specifics into account anyway, so there are limits as to what issuitable for a language spec.


Stefan

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

Reply via email to