Thomas Beale <thomas at deepthought.com.au> writes: > I wonder if this is true for people using openEHR-based components via > an API rather than communicating via data messages. I assume that the > unicode implemementation used in the String type in most of today's > languages make it easy to determine what width unicode characters you > have in the data?
In all the case I know of, once the data has been read into the native string type, discovering its "width" is no longer an issue. This is because the native string type is defined to support only a single encoding. Data is converted into that encoding when it is read in. The string type in a language is not the same between different languages. For example, in C# the string type contains UTF-16, whereas in some Unix string libraries they are in UTF-8, and some C++ libraries use UTF-32. APIs could be another reason not to specify the encoding format in the standard. Remember, character set is an independent issue to encoding. Hoylen P.S. As an aside, Java 1.1 and later uses Unicode 2.0 character set. So if Unicode 3.0 or Unicode 4.0 is the target character set, implementations may be forced to implement their own string class rather than using the native java.lang.String. Something to consider when picking which Unicode version as the standard character set. -- ______________________________________________ Dr Hoylen Sue h.sue at dstc.edu.au http://www.dstc.edu.au/ DSTC Pty Ltd --- Australian W3C Office +61 7 3365 4310 - If you have any questions about using this list, please send a message to d.lloyd at openehr.org