Data Types

Thomas Beale Wed, 12 Jun 2002 11:24:15 +1000


Tim Benson wrote:


>Sam, I think you have misunderstood me.  Human beings love complex patterns,
>but computers hate them.  Of course you must keep the richness of "the day
>before the big storm", but you should not try to put that sort of thing into
>a Julian date field.  Let people do what they are good for and let us use
>computing for what it is good for.  The fact is computers do not like
>ambiguity.  The question is always what do we want to use this info for?  Is
>it to structure a record in chronological order or what?
>
>Tim
>
I think Tim's general considerations are correct (or at least I agree 
with them ;-) - the reasons to use structured v non-structured data 
items (or any items for that matter) are:
- if you have enough raw data to build the structured item
- if the information is to be used in computation

I think these principles are correct.. but we do need to understand them.

The general design of the openEHR data types follows these principles in 
that you cannot create any item unless you can provide the required data 
to the creation routine; i.e. you can only create valid data items, be 
they quantities, terms or whatever.

However, there are times when you don't quite have all the raw data, but 
a) you have enough to build a reasonable version of a data instance, and 
b) you want to be able to compute on the instance. Partial dates and 
times fall into the category, and this is why we have created separate 
classes of them. If you have year and month only, you cannot create a 
valid DATE instance, but you can create a valid PARTIAL_DATE instance, 
which will still satisfy the computational requirements of DATEs (by 
synthesising reasonable mid-month dates, etc)

For data which is really quite unreliable, we suggest that it be 
recorded as narrative text, as Tim mentioned earlier.

Contrast this with the HL7 data type approach where every type and every 
attribute and function result can be Null indicating it is unknown. The 
idea of this (according to Gunther) is so that no matter how little you 
know, you can record it in structured form. We can think of this design 
approach as a completely fuzzy approach. As an example, you can have a 
IVL<TS> (interval of time) with unknown low and high values. I have 
noted that this makes it nearly useless for computation, since you can't 
even call the contains(a_time:TS) routine - well you can, but you will 
get back "UNK" (unknown) as a value.

I see dangers in this approach:
- the specification is more complex, since the semantics have to include 
the case where each and every attribute might be Null. Complex 
specifications are more likely to lead to implementation bugs
- software will be more complex because it has to be able to handle UNKs
- unreliable raw data is being used to create structured data instances 
whch might be treated by software as being more reliable than they 
really are
- if there is software operating on the data that does not understand 
the possibility that UNK can be returned from function calls, it is not 
clear that the data is safely processable

I can see the theoretical interest of recording unreliable data in a 
structured way, even if half of it is missing, but practically I don't 
think that it is a very useful thing to do, except in exceptional (and 
common) cases like date & time. Gunther says that people may come back 
and fill in the missing bits, but in general I think this is quite 
unlikely - no-one has time. (Exceptions might be partial data gathered 
in A&E or similar situations).

Hence we have opted for a simpler approach:
- in general data types are designed in a pure fashion - no general 
facility for unknown elements
- special data types for partial data are specified; the advantage of 
this is that the semantics of these types are clear
- Null markers are recorded,  not inside data instances, but where they 
are used, e.g. in the ELEMENT class in the EHR reference model

thoughts?

- thomas beale


-
If you have any questions about using this list,
please send a message to d.lloyd at openehr.org

Data Types

Reply via email to