Am I right in thinking that perl's internal utf8 representation
represents surrogates as a single (4 byte) code point and not as
two separate code points?
This is the form that Oracle call AL32UTF8.
What would be the effect of setting SvUTF8_on(sv) on a valid utf8
byte string that used surrogates
Tim Bunce wrote:
> Am I right in thinking that perl's internal utf8 representation
> represents surrogates as a single (4 byte) code point and not as
> two separate code points?
Mmmh. Right and wrong... as a single code point, yes, since the real
UTF-8 doesn't do surrogates which are only a UTF-
Jarkko Hietaniemi wrote:
>
> Tim Bunce wrote:
>
> > Am I right in thinking that perl's internal utf8 representation
> > represents surrogates as a single (4 byte) code point and not as
> > two separate code points?
>
> Mmmh. Right and wrong... as a single code point, yes, since
> the real UT