RFC - CR-000150 - express language etc as a String

Thomas Beale Thu, 18 Aug 2005 12:32:26 +0100

As part of the current group of CRs being analysed for openEHR, we are 
considering CR-000150 
(http://coruscant.chime.ucl.ac.uk:8200/openEHR_Collector/projects/specifications/CR/150
 
in the CR system) which basically says that where we have an attribute 
in the refreence model which represents language, encoding or territory, 
we should directly use Strings rather than use CODE_PHRASE as we do now.


Current Situation
-----------------
These particular attributes, which occur, for example in the class 
DV_TEXT (see 
http://svn.openehr.org/specification/TRUNK/publishing/architecture/rm/data_types_im.pdf)
 
use international standard codesets as follows:
- language is represented by a CODE_PHRASE object containing a code-set 
id for openehr-languages, which is the same as ISO 639 2-character 
language codes
- encoding is similarly represented, but using codes from IANA character 
sets, see http://www.iana.org/assignments/character-sets
- territory is similarly represented, using ISO 3166 2-character country 
codes

All three of these codesets are currently 'wrapped' by openEHR code-sets 
(see Support IM, 
http://svn.openehr.org/specification/TRUNK/publishing/architecture/rm/support_im.pdf),
 
and it is the openEHR code-sets which are mentioned in the reference 
model invariants, thus forcing the appropriate attributes always to be a 
code from the appropriate code set. This level of indirection allows for 
openEHR to, in the future, use different code sets for this purpose 
(e.g. the ISO 3-character code sets, or perhaps an ISO replacement for 
the IANA charater set names, or even IANA equivalents for the ISO code 
sets); the reference model would remain valid regardless.

The logic for choosing to model these codes as CODE_PHRASEs in openEHR 
was for consistency: every coded entity in openEHR is either a 
DV_CODED_TEXT (which contains a CODE_PHRASE) or a CODE_PHRASE (used when 
the codes themselves carry the meaning, as most of the ISO and IANA 
codesets do). IN practical terms it does of course mean slightly more 
data instances at a fine-grained level; e.g. in XML you would see more 
tags and data items for each CODE_PHRASE compared to a simple String field.


Proposed Situation
-------------------
Sam Heard has proposed that these three types of codes should be 
hard-wired into the reference model - as direct string attributes, and 
that the reference model documentation should simply say that the 
particular ISO or IANA codes are mandatory in each case.

This is a reasonable position - these codesets seem to be very stable - 
some would say they are the most stable of any coded entity today. There 
is undoubtedly software around which does hardwire such codes, and has 
never had a problem. There is also an argument for simpler object 
structures as well - a String is simpler than a CODE_PHRASE. However, 
semantically, the current and proposed solutions are the same - in the 
current situation, invariants guarantee the the codes must come from the 
appropriate codesets for each particular attribute.

Possible objections are:
- the indirection we currently have is useful: there is no guarantee 
that we won't have to move to another code-set which better serves the 
same purpose
- the consistency in the software (all coded entities are _always_ dealt 
with via the terminology service, no matter what they are) is preferable 
to having certain fields that the software itself directly knows the 
codes of


We would be interested in opinions on this proposal. I personally do not 
know whether we can regard the ISO and IANA code sets for country, 
language and character-sets as 'safe for all time'; does anyone have 
inside knowledge on this? Any other opinions welcome.

- thomas beale

-- 
___________________________________________________________________________________
CTO Ocean Informatics (http://www.OceanInformatics.biz)
Research Fellow, University College London (http://www.chime.ucl.ac.uk)
Chair Architectural Review Board, openEHR (http://www.openEHR.org)

-
If you have any questions about using this list,
please send a message to d.lloyd at openehr.org

RFC - CR-000150 - express language etc as a String

Reply via email to