Re: [Python-3000] How will unicode get used?

Guido van Rossum Wed, 20 Sep 2006 11:32:12 -0700

On 9/20/06, Adam Olsen <[EMAIL PROTECTED]> wrote:
> On 9/20/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> > On 9/20/06, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > > Before we can decide on the internal representation of our unicode
> > > objects, we need to decide on their external interface.  My thoughts
> > > so far:
> >
> > Let me cut this short. The external string API in Py3k should not
> > change or only very marginally so (like removing rarely used useless
> > APIs or adding a few new conveniences). The plan is to keep the 2.x
> > API that is supported (in 2.x) by both str and unicode, but merge the
> > twp string types into one. Anything else could be done just as easily
> > before or after Py3k.
>
> Thanks, but one thing remains unclear: is the indexing intended to
> represent bytes, code points, or code units?


I don't see what's unclear -- the existing unicode object does what it does.

> Note that C code
> operating on UTF-16 would use code units for slicing of UTF-16, which
> splits surrogate pairs.

I thought we were discussing the Python API.

C code will likely have the same access to unicode objects as it has in 2.x.

> As far as I can tell, CPython on windows uses UTF-16 with code units.
> Perhaps not intentionally, but by default (not throwing an error on
> surrogates).

This is intentional, to be compatible with the rest of that platform.
Jython and IronPython do this too I believe.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] How will unicode get used?

Reply via email to