character sets and languages in openEHR

Hoylen Sue 06 Apr 2004 13:37:40 +1000

Thomas Beale <thomas at deepthought.com.au> writes:
> I wonder if this is true for people using openEHR-based components via
> an API rather than communicating via data messages. I assume that the
> unicode implemementation used in the String type in most of today's
> languages make it easy to determine what width unicode characters you
> have in the data?


In all the case I know of, once the data has been read into
the native string type, discovering its "width" is no longer
an issue.  This is because the native string type is defined
to support only a single encoding.  Data is converted into
that encoding when it is read in.

The string type in a language is not the same between
different languages.  For example, in C# the string type
contains UTF-16, whereas in some Unix string libraries they
are in UTF-8, and some C++ libraries use UTF-32.

APIs could be another reason not to specify the encoding
format in the standard. Remember, character set is an
independent issue to encoding.

Hoylen


P.S. As an aside, Java 1.1 and later uses Unicode 2.0
character set.  So if Unicode 3.0 or Unicode 4.0 is the
target character set, implementations may be forced to
implement their own string class rather than using the
native java.lang.String.  Something to consider when picking
which Unicode version as the standard character set.
-- 
______________________________________________ Dr Hoylen Sue
h.sue at dstc.edu.au                    http://www.dstc.edu.au/
DSTC Pty Ltd --- Australian W3C Office       +61 7 3365 4310

-
If you have any questions about using this list,
please send a message to d.lloyd at openehr.org

character sets and languages in openEHR

Reply via email to