On 9/20/06, Adam Olsen <[EMAIL PROTECTED]> wrote: > On 9/20/06, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > On 9/20/06, Adam Olsen <[EMAIL PROTECTED]> wrote: > > > Before we can decide on the internal representation of our unicode > > > objects, we need to decide on their external interface. My thoughts > > > so far: > > > > Let me cut this short. The external string API in Py3k should not > > change or only very marginally so (like removing rarely used useless > > APIs or adding a few new conveniences). The plan is to keep the 2.x > > API that is supported (in 2.x) by both str and unicode, but merge the > > twp string types into one. Anything else could be done just as easily > > before or after Py3k. > > Thanks, but one thing remains unclear: is the indexing intended to > represent bytes, code points, or code units?
I don't see what's unclear -- the existing unicode object does what it does. > Note that C code > operating on UTF-16 would use code units for slicing of UTF-16, which > splits surrogate pairs. I thought we were discussing the Python API. C code will likely have the same access to unicode objects as it has in 2.x. > As far as I can tell, CPython on windows uses UTF-16 with code units. > Perhaps not intentionally, but by default (not throwing an error on > surrogates). This is intentional, to be compatible with the rest of that platform. Jython and IronPython do this too I believe. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
