On Fri, 04 Feb 2000 09:52:20 PST, Larry Wall wrote:
>The long answer is that we're phasing out the experimental "use utf8"
The status as of 640 is that only two things are affected by
C<use utf8>: interpretation of literals/identifiers in the source text;
and how REs are compiled. Both should go away.
Having it affect the interpretation of identifiers is a bit bogus,
since high-bit chars have never been allowed in them before, so
we could just always interpret them as utf8.
Treating literals as utf8 is a bit of a compatibility issue, but
I think we should get around that by treating the lex input stream
as any other discipline. IOW, default PL_rsfp to byte mode,
and let users push a utf8/utf16/whatever discipline on it if they
wanna. (This would apply to identifiers as well.)
Converting the RE code to compile down to polymorphic ops still needs a
bit of work, by my reckoning. Ilya, you hearing me? :-)