On Thu, 2004-04-29 at 11:16, Tim Bunce wrote: > Am I right in thinking that perl's internal utf8 representation > represents surrogates as a single (4 byte) code point and not as > two separate code points? > > This is the form that Oracle call AL32UTF8. > > What would be the effect of setting SvUTF8_on(sv) on a valid utf8 > byte string that used surrogates? Would there be problems? > (For example, a string returned from Oracle when using the UTF8 > character set instead of the newer AL32UTF8 one.) > I think it makes no difference. (at least I could no find one), except for the internal storage. Several of the tests I wrote print a sql DUMP(nch), and you can see the difference in the internal store in those prints. The strings come back to the client, the way they were put in.
I have tested this with 4 databases dbcharset/ncharset --------- -------- us7ascii/utf8 us7ascii/all6utf16 utf8 /utf8 utf8 /al16utf16 All tests produce the same results with all databases using both .UTF8 and .AL32UTF8 in NLS_LANG. Lincoln