> - chr(i) returns a len-1 or len-2 string for all i in range(0, 0x11) and
> ord(chr(i)) == i for all i in range(0, 0x11)
This would contradict an explicit decision in PEP 261. I'm don't quite
remember the rationale for that, however, the PEP mentions that ord()
should be symmetric with
>>> Then bytes can be bytes, and unicode can be unicode, and str8 can be
>>> encoded strings for interfacing with the outside non-unicode world. Or
>>> something like that.
>>
>> Hm... Requiring each str8 instance to have an encoding might be a
>> problem -- it means you can't just create one fro
Guido van Rossum wrote:
> On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote:
>> Well I can see where a str8() type with an __incoded_with__ attribute
>> could
>> be useful. It would use a bit more memory, but it won't be the
>> default/primary string type anymore so maybe it's ok.
>>
>> Then bytes
On 6/13/07, Ron Adam <[EMAIL PROTECTED]> wrote:
> Well I can see where a str8() type with an __incoded_with__ attribute could
> be useful. It would use a bit more memory, but it won't be the
> default/primary string type anymore so maybe it's ok.
>
> Then bytes can be bytes, and unicode can be uni
Jim Jewett writes:
> I suspect there may be others that are guaranteed never to get an
> assignment, because of their location. (Example: The character would
> have to have certain properties or be part of a specific script, but
> adding more such characters would violate some other stabilit
On 6/14/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> > There are also plenty of things that a native speaker may view as a
> > single character, but which unicode treats as (at most) a Named
> > Sequence.
> Eg, the New Line Function (Unicode's name for "universal newline"),
> which can
On 6/14/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > There are also some that are explicitly not characters.
> > (U+FD00..U+FDEF)
> ??? U+FD00 is ARABIC LIGATURE HAH WITH YEH ISOLATED FORM,
> U+FDEF is unassigned.
Sorry; typo on my part. The start of the range is u+fdD0, not 00.
I suspe
On 6/13/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> except that people will sneak in some UTF-16 behavior where it seems useful.
How about sneaking these in py3k-struni:
- chr(i) returns a len-1 or len-2 string for all i in range(0, 0x11) and
ord(chr(i)) == i for all i in range(0,
On 6/14/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 6/13/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > A code point is something that has a 1:1 relationship with a logical
> > character (in particular, a Unicode character).
As the word "character" is ambiguous, I'd put it this way:
Jim Jewett writes:
> > Apart from the surrogates, are there code points that aren't
> > characters?
> Yes. The BOM mark, for one.
Nitpick: The BOM *is* a character (FEFF, aka ZERO-WIDTH NO-BREAK
SPACE). Its byte-swapped counterpart FFFE is guaranteed *not* to be a
character. (Martin wrote
10 matches
Mail list logo