AL32UTF8

2004-04-29 Thread Tim Bunce
Am I right in thinking that perl's internal utf8 representation represents surrogates as a single (4 byte) code point and not as two separate code points? This is the form that Oracle call AL32UTF8. What would be the effect of setting SvUTF8_on(sv) on a valid utf8 byte string that used surrogates

Re: AL32UTF8

2004-04-29 Thread Jarkko Hietaniemi
Tim Bunce wrote: > Am I right in thinking that perl's internal utf8 representation > represents surrogates as a single (4 byte) code point and not as > two separate code points? Mmmh. Right and wrong... as a single code point, yes, since the real UTF-8 doesn't do surrogates which are only a UTF-

Re: AL32UTF8

2004-04-29 Thread Brian Stell
Jarkko Hietaniemi wrote: > > Tim Bunce wrote: > > > Am I right in thinking that perl's internal utf8 representation > > represents surrogates as a single (4 byte) code point and not as > > two separate code points? > > Mmmh. Right and wrong... as a single code point, yes, since > the real UT