Re: AL32UTF8

2004-05-02 Thread Jarkko Hietaniemi
So the key question is... can we just do SvUTF8_on(sv) on either of these kinds of Oracle UTF8 encodings? Seems like the answer is yes, from what Jarkko says, because they are both valid UTF8. We just need to document the issue. No, Oracle's UTF8 is very much not valid UTF-8. Valid UTF-8

Re: AL32UTF8

2004-05-02 Thread Lincoln A. Baxter
On Sat, 2004-05-01 at 00:37, Lincoln A. Baxter wrote: On Fri, 2004-04-30 at 08:03, Tim Bunce wrote: On Thu, Apr 29, 2004 at 10:42:18PM -0400, Lincoln A. Baxter wrote: On Thu, 2004-04-29 at 11:16, Tim Bunce wrote: Am I right in thinking that perl's internal utf8 representation

Re: AL32UTF8

2004-05-02 Thread Tim Bunce
On Sat, May 01, 2004 at 05:35:58PM -0400, Lincoln A. Baxter wrote: Hello Owen, On Sat, 2004-05-01 at 16:46, Owen Taylor wrote: On Fri, 2004-04-30 at 08:03, Tim Bunce wrote: You can use UTF8 and AL32UTF8 by setting NLS_LANG for OCI client applications. If you do not need

Re: AL32UTF8

2004-05-01 Thread Jungshik Shin
Tim Bunce wrote: On Fri, Apr 30, 2004 at 10:58:19PM +0700, Martin Hosken wrote: IIRC AL32UTF8 was introduced at the behest of Oracle (a voting member of Unicode) because they were storing higher plane codes using the surrogate pair technique of UTF-16 mapped into UTF-8 (i.e. resulting in 2

Re: AL32UTF8

2004-04-30 Thread Tim Bunce
[The background to this is that Lincoln and I have been working on Unicode support for DBD::Oracle. (Actually Lincoln's done most of the heavy lifting, I've mostly been setting goals and directions at the DBI API level and scratching at edge cases. Like this one.)] On Thu, Apr 29, 2004 at

Re: AL32UTF8

2004-04-30 Thread Martin Hosken
Dear Tim, CESU-8 defines an encoding scheme for Unicode identical to UTF-8 except for its representation of supplementary characters. In CESU-8, supplementary characters are represented as six-byte sequences resulting from the transformation of each UTF-16 surrogate code unit into an eight-bit

Re: AL32UTF8

2004-04-30 Thread Tim Bunce
On Fri, Apr 30, 2004 at 03:49:13PM +0300, Jarkko Hietaniemi wrote: Okay. Thanks. Basically I need to document that Oracle AL32UTF8 should be used as the client charset in preference to the older UTF8 because UTF8 doesn't do the best? thing with surrogate pairs. because what Oracle

Re: AL32UTF8

2004-04-29 Thread Jarkko Hietaniemi
Tim Bunce wrote: Am I right in thinking that perl's internal utf8 representation represents surrogates as a single (4 byte) code point and not as two separate code points? Mmmh. Right and wrong... as a single code point, yes, since the real UTF-8 doesn't do surrogates which are only a UTF-16