Are you asking about the Java or C++ versions? As far as an internal
representation, they both use Unicode as their internal representation,
using surrogates to support formats such as UCS-4 where necessary. For the
C++ side, we have a transcoding framework that allows you to support
transcoding through whatever mechanism you might want to use. We provide
implementations of this framework for ICU (which supports about 100
encodings or so I'm told) and Iconv under the Unixes, and a very simple one
for Win32 that does not support any non-intrinsic encodings.

The C++ parser supports intrinsically ASCII, UTF-8, UTF-16, UCS-4. And as
of the next release it will add Latin1 and EBCDIC-US as intrinsic
encodings. Any others are done via the transcoding service by asking it to
create a transcoder for the named encoding that we found in the encoding=""
line, if any. If there is no encoding="" statement, then by definition the
file must be in one of the formats we support intrinsically (not counting
Latin1, which still requires an encoding="", we just provide Latin1 to
optimize it and to allow wider use of our stuff on Windows without using
ICU.)

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]



Michael Burbidge <[EMAIL PROTECTED]> on 01/07/2000 03:24:49 PM

Please respond to [EMAIL PROTECTED]

To:   <[EMAIL PROTECTED]>
cc:
Subject:  Transcoders...



Is there any documentation on transcoders? I know they're relatively
simple.
The general idea is that a transcoder can convert from one character
encoding to another. Internally does xerces always deal with one particular
encoding, maybe Unicode?

Thanks,
Mike-




Reply via email to