character sets and languages in openEHR

David Neilsen Thu, 18 Mar 2004 07:51:44 +1100

Some work is being done with upper ontologies that enable the conversion 
of concepts etc from one language system to another (ABC model?).  This 
could be between different languages or between say health and welfare 
sectors of one country. I was involved recently in a meeting with the 
CEO of  DSTC, a  government/education/private sector cooperative. I have 
only had a fleeting glimpse of what they are doing but it may have some 
bearing on this sort of thing.


DSTC is involved with one of the proof projects for Australia's 
HealthConnect initiative and are working with archetypes. Sam H, Thomas 
B and Peter S are probably already aware of at least some of their 
activities. Not sure if DSTC are party to this discussion group.

CEO of DSTC is Mark Gibson, mark.gibson at dstc.edu.au

David Neilsen
AIHW

Tim Churches wrote:

>On Thu, 2004-03-18 at 00:51, gfrer wrote:
>  
>
>>Hi,
>>
>>
>>Anamnesis in psychiatry:
>>
>><italic>And then the disturbed patient said: "Merdre". [Translation:
>>shit]
>>
>></italic>
>>
>>Family history:
>>
>><italic>My father was diagnosed as suffering from: "Engelse ziekte"
>>[Translation: Rickets dissease]
>>
>>
>></italic>Codingsystems<italic>
>>
>>ICPC-1 Dutch version.
>>
>>Code: R05.
>>
>>Displayed text: Hoest
>>
>>Added translation: Cough
>>
>></italic>
>>    
>>
>
>Yes, I thought of examples which were similar to these. And it is not
>just a matter of the recording health professional not knowing what
>"Engelse ziekte" means, and thus having to record to verbatim and
>untranslated - many diagnoses have no equivalent in other
>languages/cultures, and are thus untranslatable (at least not without
>some information loss). Given that the "foreign" language text may
>require accented characters, or even a completely different character
>set, then the Unicode encoding used for the entry will need to be
>captured as well as the language, unless openEHR will be restricted
>purely to one Unicode encoding, such as UTF-8. Remember the golden rule
>with Unicode: "If you don't know the encoding, you don't know nuffin'."
>
>The only problem with "UTF-8 everywhere" is that it is Roman alphabet
>chauvinistic, in that the basic Roman characters are all represented
>with one byte, but everything else needs two bytes. That dooms all
>Russian openEHR records to using twice as much storage as the equivalent
>English openEHR records. In these days of massive cheap disc storage and
>high speed networks, that fact probably doesn't matter, but it just
>seems unfair, although I can't think of a better alternative. As an
>English speaker, I would not be keen if openEHR mandated the use of
>UTF-16, thus forcing me to use two bytes for every letter. Yet that's
>what UTF-8 forces Russians, and Greeks, and Thais and Vietnamese and
>just about every other non-Roman alphabetic language speaker to do. Of
>course, ideographic languages like Chinese are doomed to use more than
>one byte per character, but then the language itself encodes a lot more
>information in each character, so it probably works out about the same
>in the end.
>
>  
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20040318/9fa683c1/attachment.html>

character sets and languages in openEHR

Reply via email to