Data Types

Thomas Beale Mon, 10 Jun 2002 18:03:41 +1000


Tim Benson wrote:


 >Tom,
 >I do not think that structure can be justified if that structure is 
unlikely
 >to add either value or safety down the line.  So in the situation where we
 >are not able to rely on a time as being either a strict point in time 
or an
 >interval is likely to create semantic problems.  Unless you can rely on
 >strict chronological listing it is unhelpful to try to give spurious
 >precision.  So my suggestion is that such fuzzy dates should be put into
 >free text only and all dates associated with any entry should only be the
 >ones we can rely on, such as date and time of entry.
 >
We are having an interesting debate on just this topic on the HL7 CQ
list (don't know if you get that). The HL7 data type modelling approach
seems to be to include Null markers all over the place inside the data
types, so that no matter how little you know, you can still create an
instance of a structured data item. My reponse to this has been:

- it makes the data type specification quite a lot more complex, since
now the semantics have to always include the possibility of an attribute
or function result of a data item being Null (just start thinking about
this and it will become more obvious)
- it will make the implementation of data types and also software that
uses them more complicated
- it will create some data instances where parts of the item are
missing, which will IMO be quite unexpected by most software. E.g.
IVL<T>s with missing upper and lower limits (but the principle is
general and applies to all data types). I think there is the potential
for unsafe data via this approach.

In the long term, I think this may cause pollution of EHRs and other
systems with unreliable data items, and cause erroneous results in some
decision support and query-based applications. It will also prevent
applications based on a more typical concept of data types from working
properly.

I am not saying the HL7 approach is invalid - it is valid - but it is
also quite complex, and overkill in most cases (in some parts of the RIM
it is in fact in error, but that's another argument).

The openEHR approach is much simpler:
- data types are "clean" - Null markers are specified at the next level
up in the model
- some special partial data types such as PARTIAL_DATE are specified,
because they occur commonly. The model of PARTIAL_DATE explicitly says
what can be missing and what cannot be, and defines all its semantics
accordingly
- if not enough information is known to create a data item, it should be
recorded as narrative. This way, decision support and querying will not
be operating on unreliable data.

This approach can be summarised as an "all-or-nothing" approach - either
you have the required values to create the data item, or you don't. The
HL7 approach can be described as an "anything-goes" approach - you can
create a structured data item no matter how little you know; it will
just have fewer or more Null markers.

I am partway though writing up the different design approaches, which I
will post if anyone wants to see it.

I wonder what others think.

- thomas beale




-
If you have any questions about using this list,
please send a message to d.lloyd at openehr.org

Data Types

Reply via email to