On Thu, 2004-03-18 at 00:51, gfrer wrote: > Hi, > > > Anamnesis in psychiatry: > > <italic>And then the disturbed patient said: "Merdre". [Translation: > shit] > > </italic> > > Family history: > > <italic>My father was diagnosed as suffering from: "Engelse ziekte" > [Translation: Rickets dissease] > > > </italic>Codingsystems<italic> > > ICPC-1 Dutch version. > > Code: R05. > > Displayed text: Hoest > > Added translation: Cough > > </italic>
Yes, I thought of examples which were similar to these. And it is not just a matter of the recording health professional not knowing what "Engelse ziekte" means, and thus having to record to verbatim and untranslated - many diagnoses have no equivalent in other languages/cultures, and are thus untranslatable (at least not without some information loss). Given that the "foreign" language text may require accented characters, or even a completely different character set, then the Unicode encoding used for the entry will need to be captured as well as the language, unless openEHR will be restricted purely to one Unicode encoding, such as UTF-8. Remember the golden rule with Unicode: "If you don't know the encoding, you don't know nuffin'." The only problem with "UTF-8 everywhere" is that it is Roman alphabet chauvinistic, in that the basic Roman characters are all represented with one byte, but everything else needs two bytes. That dooms all Russian openEHR records to using twice as much storage as the equivalent English openEHR records. In these days of massive cheap disc storage and high speed networks, that fact probably doesn't matter, but it just seems unfair, although I can't think of a better alternative. As an English speaker, I would not be keen if openEHR mandated the use of UTF-16, thus forcing me to use two bytes for every letter. Yet that's what UTF-8 forces Russians, and Greeks, and Thais and Vietnamese and just about every other non-Roman alphabetic language speaker to do. Of course, ideographic languages like Chinese are doomed to use more than one byte per character, but then the language itself encodes a lot more information in each character, so it probably works out about the same in the end. -- Tim C PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere or at http://members.optushome.com.au/tchur/pubkey.asc Key fingerprint = 8C22 BF76 33BA B3B5 1D5B EB37 7891 46A9 EAF9 93D0 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20040318/0b2519f7/attachment.asc>

