Re: [Python-3000] String comparison

Jim Jewett Tue, 12 Jun 2007 10:43:05 -0700

On 6/12/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote:
> On 6/12/07, Jim Jewett <[EMAIL PROTECTED]> wrote:
> > On 6/12/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote:
> > > Practically speaking, there's little need to interpret
> > > surrogate pairs as two code points instead of as one
> > > non-BMP code point.


> > Depends on your definition of "practically".

> > Python does interpret them that way to maintain O(1) positional
> > access within strings encoded with 16 bits/char.

> Indexing does not try to interpret the string as code points at all, it
> works on code units.

Even assuming that (when most people will assume "letters", and could
maybe understand that accent marks sometimes count), it still doesn't
quite work.

Slicing (or iterating over) a string claims to return strings of the same type.

>>> for x in u"abc": print type(x)

<type 'unicode'>
<type 'unicode'>
<type 'unicode'>

Strictly speaking, the surrogate pairs should be returned together,
rather that as separate code units.  It probably won't be fixed, since
those who care most are probably using 4-byte unicode characters.

-jJ
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] String comparison

Reply via email to