Re: [Python-Dev] PEP 393 Summer of Code Project

Guido van Rossum Fri, 26 Aug 2011 10:29:00 -0700

On Fri, Aug 26, 2011 at 10:13 AM, Paul Moore <[email protected]> wrote:
> On 26 August 2011 18:02, Guido van Rossum <[email protected]> wrote:
>
>> Eek. No, please. Those platforms' native string types have length and
>> slicing operations that are O(1) and work in terms of 16-bit code
>> points. Python should use those. It would be awful if Java and Python
>> code doing the same manipulations on the same string would come to
>> different conclusions because Python tried to paper over surrogates.
>
> *That* is actually the erroneous assumption I had made - that the Java
> and .NET native string type had code point semantics (i.e., took
> surrogates into account). As that isn't the case, my comments aren't
> valid - and I agree that having common semantics (and hence exposing
> surrogates) is too important to lose.


Those platforms probably *also* have libraries of operations to
support writing apps that conform to the Unicode standard. But those
apps will have to be aware of the difference between the "naive"
length of a string and the number of code points of characters in it.

> On the other hand, that pretty much establishes that whatever PEP 393
> achieves in terms of allowing all builds of CPython to offer code
> point semantics, the language definition can't mandate it.

The most severe consequence to me seems that the stdlib (which is
reused by those other platforms) cannot assume CPython's ideal world
-- even if specific apps sometimes can.

-- 
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

Reply via email to