So the key question is... can we just do SvUTF8_on(sv) on either
of these kinds of Oracle UTF8 encodings? Seems like the answer is
yes, from what Jarkko says, because they are both valid UTF8.
We just need to document the issue.
No, Oracle's UTF8 is very much not valid UTF-8. Valid UTF-8
On Sat, 2004-05-01 at 00:37, Lincoln A. Baxter wrote:
On Fri, 2004-04-30 at 08:03, Tim Bunce wrote:
On Thu, Apr 29, 2004 at 10:42:18PM -0400, Lincoln A. Baxter wrote:
On Thu, 2004-04-29 at 11:16, Tim Bunce wrote:
Am I right in thinking that perl's internal utf8 representation
On Sat, May 01, 2004 at 05:35:58PM -0400, Lincoln A. Baxter wrote:
Hello Owen,
On Sat, 2004-05-01 at 16:46, Owen Taylor wrote:
On Fri, 2004-04-30 at 08:03, Tim Bunce wrote:
You can use UTF8 and AL32UTF8 by setting NLS_LANG for OCI client
applications. If you do not need
Tim Bunce wrote:
On Fri, Apr 30, 2004 at 10:58:19PM +0700, Martin Hosken wrote:
IIRC AL32UTF8 was introduced at the behest of Oracle (a voting member of
Unicode) because they were storing higher plane codes using the
surrogate pair technique of UTF-16 mapped into UTF-8 (i.e. resulting in
2
[The background to this is that Lincoln and I have been working on
Unicode support for DBD::Oracle. (Actually Lincoln's done most of
the heavy lifting, I've mostly been setting goals and directions
at the DBI API level and scratching at edge cases. Like this one.)]
On Thu, Apr 29, 2004 at
Dear Tim,
CESU-8 defines an encoding scheme for Unicode identical to UTF-8
except for its representation of supplementary characters. In CESU-8,
supplementary characters are represented as six-byte sequences
resulting from the transformation of each UTF-16 surrogate code
unit into an eight-bit
On Fri, Apr 30, 2004 at 03:49:13PM +0300, Jarkko Hietaniemi wrote:
Okay. Thanks.
Basically I need to document that Oracle AL32UTF8 should be used
as the client charset in preference to the older UTF8 because
UTF8 doesn't do the best? thing with surrogate pairs.
because what Oracle
Tim Bunce wrote:
Am I right in thinking that perl's internal utf8 representation
represents surrogates as a single (4 byte) code point and not as
two separate code points?
Mmmh. Right and wrong... as a single code point, yes, since the real
UTF-8 doesn't do surrogates which are only a UTF-16