On Fri, Aug 26, 2011 at 10:13 AM, Paul Moore <p.f.mo...@gmail.com> wrote: > On 26 August 2011 18:02, Guido van Rossum <gu...@python.org> wrote: > >> Eek. No, please. Those platforms' native string types have length and >> slicing operations that are O(1) and work in terms of 16-bit code >> points. Python should use those. It would be awful if Java and Python >> code doing the same manipulations on the same string would come to >> different conclusions because Python tried to paper over surrogates. > > *That* is actually the erroneous assumption I had made - that the Java > and .NET native string type had code point semantics (i.e., took > surrogates into account). As that isn't the case, my comments aren't > valid - and I agree that having common semantics (and hence exposing > surrogates) is too important to lose.
Those platforms probably *also* have libraries of operations to support writing apps that conform to the Unicode standard. But those apps will have to be aware of the difference between the "naive" length of a string and the number of code points of characters in it. > On the other hand, that pretty much establishes that whatever PEP 393 > achieves in terms of allowing all builds of CPython to offer code > point semantics, the language definition can't mandate it. The most severe consequence to me seems that the stdlib (which is reused by those other platforms) cannot assume CPython's ideal world -- even if specific apps sometimes can. -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com