On Sat, 2004-05-01 at 04:57, Jarkko Hietaniemi wrote: > > If Jarkko > > tells me you can do bitwise operations with unicode text now in Perl > > 5, well... we'll support it there, too, though we shan't like it at > > all. > > We can and I don't like it at all [...] > None of it anything I want to propagate anywhere.
Please correct me if I'm wrong here, but I'm going to lay out my
understanding as a set of assertions:
* Parrot will be able to convert any encoding to any other
encoding
* though, some conversions will result in an exception, that's
still a defined behavior
* We've agreed that only raw binary 8-bit strings make sense for
bit vector operations
So it seems to me that the "obvious" way to go is to have all bit-s
operations first convert to raw bytes (possibly throwing an exception)
and then proceed to do their work.
This means that UTF-8 strings will be handled just fine, and (as I
understand it) some subset of Unicode-at-large will be handled as well.
In other-words, the burden goes on the conversion functions, not on the
bit ops.
It's not that it's going to be meaningful in the general case, but if
you have code like:
sub foo() { return "\x01"+|"\x02" }
I would expect the get the bit-string, "\x03" back even though strings
may default to Unicode in Perl 6.
You could put this on the shoulders of the client language (by saying
that the operands must be pre-converted, but that seems to be contrary
to Parrot's usual MO.
Let me know. I'm happy to do it either way, and I'll look at modifying
the other bit-string operators if they don't conform to the decision.
--
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback
signature.asc
Description: This is a digitally signed message part
