Re: [Python-3000] String comparison

2007-06-14 Thread Martin v. Löwis
> - chr(i) returns a len-1 or len-2 string for all i in range(0, 0x11) and > ord(chr(i)) == i for all i in range(0, 0x11) This would contradict an explicit decision in PEP 261. I'm don't quite remember the rationale for that, however, the PEP mentions that ord() should be symmetric with

Re: [Python-3000] setup.py fails in the py3k-struni branch

2007-06-14 Thread Martin v. Löwis
>>> Then bytes can be bytes, and unicode can be unicode, and str8 can be >>> encoded strings for interfacing with the outside non-unicode world. Or >>> something like that. >> >> Hm... Requiring each str8 instance to have an encoding might be a >> problem -- it means you can't just create one fro

Re: [Python-3000] setup.py fails in the py3k-struni branch

2007-06-14 Thread Ron Adam
Guido van Rossum wrote: > On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote: >> Well I can see where a str8() type with an __incoded_with__ attribute >> could >> be useful. It would use a bit more memory, but it won't be the >> default/primary string type anymore so maybe it's ok. >> >> Then bytes

Re: [Python-3000] setup.py fails in the py3k-struni branch

2007-06-14 Thread Guido van Rossum
On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote: > Well I can see where a str8() type with an __incoded_with__ attribute could > be useful. It would use a bit more memory, but it won't be the > default/primary string type anymore so maybe it's ok. > > Then bytes can be bytes, and unicode can be uni

Re: [Python-3000] String comparison

2007-06-14 Thread Stephen J. Turnbull
Jim Jewett writes: > I suspect there may be others that are guaranteed never to get an > assignment, because of their location. (Example: The character would > have to have certain properties or be part of a specific script, but > adding more such characters would violate some other stabilit

Re: [Python-3000] String comparison

2007-06-14 Thread Jim Jewett
On 6/14/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > > There are also plenty of things that a native speaker may view as a > > single character, but which unicode treats as (at most) a Named > > Sequence. > Eg, the New Line Function (Unicode's name for "universal newline"), > which can

Re: [Python-3000] String comparison

2007-06-14 Thread Jim Jewett
On 6/14/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > There are also some that are explicitly not characters. > > (U+FD00..U+FDEF) > ??? U+FD00 is ARABIC LIGATURE HAH WITH YEH ISOLATED FORM, > U+FDEF is unassigned. Sorry; typo on my part. The start of the range is u+fdD0, not 00. I suspe

Re: [Python-3000] String comparison

2007-06-14 Thread Rauli Ruohonen
On 6/13/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > except that people will sneak in some UTF-16 behavior where it seems useful. How about sneaking these in py3k-struni: - chr(i) returns a len-1 or len-2 string for all i in range(0, 0x11) and ord(chr(i)) == i for all i in range(0,

Re: [Python-3000] String comparison

2007-06-14 Thread Rauli Ruohonen
On 6/14/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > On 6/13/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > A code point is something that has a 1:1 relationship with a logical > > character (in particular, a Unicode character). As the word "character" is ambiguous, I'd put it this way:

Re: [Python-3000] String comparison

2007-06-14 Thread Stephen J. Turnbull
Jim Jewett writes: > > Apart from the surrogates, are there code points that aren't > > characters? > Yes. The BOM mark, for one. Nitpick: The BOM *is* a character (FEFF, aka ZERO-WIDTH NO-BREAK SPACE). Its byte-swapped counterpart FFFE is guaranteed *not* to be a character. (Martin wrote