Tim Churches wrote: > >Yes, I thought of examples which were similar to these. And it is not >just a matter of the recording health professional not knowing what >"Engelse ziekte" means, and thus having to record to verbatim and >untranslated - many diagnoses have no equivalent in other >languages/cultures, and are thus untranslatable (at least not without >some information loss). > actually, these kinds of expressions are not the problem - they can happily be recorded inside a DV_TEXT object which has the language set to English or Dutch or whatever it may be; an inline occurrence of a 'foreign' term that is routinely used by speakers of a different language (the way we use 'gesundheit' or 'triage' in english) can be assumed to be understood and is probably even in the dictionary of the language of narration.
The problem is when there are text fragments recorded where the words are viable in more than one language, and do not usually have the same meaning in each. Words in Danish & Norwegian should be almost the same, but I assume there are by now some small differences; there are certainly words in most of the European languages which occur in another language, and are completely unrelated. So in theory a language marker is needed to ensure that a later reader knows what language the words were in (maybe even to allow them to know what kind of translator to call). So the question remains - do we need the ability to have multiple languages inside a single entry? For Gerard's examples - would it really be necessary to indicate what the other languages were or not, given that they are probably obvious to most users who will use them? The real reason for the question is that having to record language everywhere all the time means wasting a certain amount of data stroage on every text fragment stored in the record; the alternative seems to be to record it on Entry; if we decide that it has to be possible to have text fragments within an Entry for which athe name of a different language is actually recorded, we can use an optional language attribute on DV_TEXT which is understood as overriding the value elsewhere. In general I am against this kind of overriding of values in lower objects in a composition - it is not OO, and it is often misunderstood by programmers given the specifications; in general it is dangerous. However, maybe this is an exception which justifies its use.... As for Unicode, obviously we cannot do much about the standard; but I guess someone had to have the 8-bit part of the code space. > Given that the "foreign" language text may >require accented characters, or even a completely different character >set, then the Unicode encoding used for the entry will need to be >captured as well as the language, unless openEHR will be restricted >purely to one Unicode encoding, such as UTF-8. Remember the golden rule >with Unicode: "If you don't know the encoding, you don't know nuffin'." > >The only problem with "UTF-8 everywhere" is that it is Roman alphabet >chauvinistic, in that the basic Roman characters are all represented >with one byte, but everything else needs two bytes. That dooms all >Russian openEHR records to using twice as much storage as the equivalent >English openEHR records. In these days of massive cheap disc storage and >high speed networks, that fact probably doesn't matter, but it just >seems unfair, although I can't think of a better alternative. As an >English speaker, I would not be keen if openEHR mandated the use of >UTF-16, thus forcing me to use two bytes for every letter. Yet that's >what UTF-8 forces Russians, and Greeks, and Thais and Vietnamese and >just about every other non-Roman alphabetic language speaker to do. Of >course, ideographic languages like Chinese are doomed to use more than >one byte per character, but then the language itself encodes a lot more >information in each character, so it probably works out about the same >in the end. > > > -- ___________________________________________________________________________________ CTO Ocean Informatics (http://www.OceanInformatics.biz) Hon. Research Fellow, University College London openEHR (http://www.openEHR.org) Archetypes (http://www.oceaninformatics.biz/adl.html) Community Informatics (http://www.deepthought.com.au/ci/rii/Output/mainTOC.html) - If you have any questions about using this list, please send a message to d.lloyd at openehr.org