On Sat, 2004-05-01 at 15:09, Jarkko Hietaniemi wrote: > > How are you defining "valid UTF-8"? Is there a codepoint in UTF-8 > > between \x00 and \xff that isn't valid? Is there a reason to ever do > > Like, half of them? \x80 .. \xff are all invalid as UTF-8.
Heh, damn Ken Thompson and his placemat!
I am too new to UCS and UTF-8, and had thought it was always 8-bit. I
stand corrected, having read up on the UTF-8 and Unicode FAQ.
Jeff, yeah I have to take back my statement. If Perl defaults to UTF-8,
then it's not a valid assumption that a UTF-8 input string won't throw
an exception. I still think that's ok, and better than
representation-expanding to the larger representation and doing the
bit-op in that, since that means that bit-vectors would have to be
valid in enum_stringrep_one, _two and _four as sort of alternate
datastructures. I don't think we want to go there.
For everything else, as Jeff correctly points out, this has nothing to
do with encoding. Only in the sense that default encoding in a language
like (only one example) Perl 6 dictates what representation you will
have to expect to be the common case.
> > bitwise operations on anything other than 8-bit codepoints?
>
> I am very confused. THIS IS WHAT WE ALL SEEM TO BE SAYING. BITOPS ONLY
> ON EIGHT-BIT DATA. AM I WRONG?
No, it's not, and could you please not get emotional about this? It's
what you, Dan and I have been saying, but I was responding to Jeff who
said:
"Just FYI, the way I implemented bitwise-not so far, was to
bitwise-not code points 0x{00}-0x{FF} as uint8-sized things,
0x{100}-0x{FFFF} as uint16-sized things, and > 0x{FFFF} as
uint32-sized things (but then bit-masking them with 0xFFFFF to
make sure that they fell into a valid code point range)."
It was kind of important that I deal with the fact that I was proposing
a very different behavior for bit-shifting than exists currently for
boolean operations, I thought.
The question becomes should I CHANGE the existing bit-ops so that they
don't work on representations in two or four bytes for symmetry?
If this continues to be so contentious, I'm tempted to agree with the
nay-sayers and say that Parrot shouldn't do bit-vectors on strings, and
we should just implement a bit-vector class later on. Perl will just
have to suffer the overhead of translation. This just IS NOT important
enough to waste this many brain cells on.
--
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback
signature.asc
Description: This is a digitally signed message part
