> [original post from Prof John Roddick, Flinders University South
> Australia, which failed to get through]
>
>> Parsons, S., 1996. Current approaches to handling imperfect
>> information in data and knowledge bases. IEEE Transactions on
>> Knowledge and Data Engineering 8 (3): 353-372.
>
>> in which he identifies five types of imperfection in data. Namely:
>
>> 1. Incomplete. (eg. test results not known or qualified as in
>> "interim results only")
>
I think this is an aspect of the real-world situation, and just means
that the information currently captured is only a "snapshot" along some
tmeline; later, the final information will (presumably) be available. In
openEHR, this would be indicated in the clinical info itself, e.g.
pathology results might say "preliminary results". We don't need to do
anything special in this case.
In cases like an unconscious person coming to A&E, and the admission
form on the screen requires all sorts of things which cannot be answered
for now, traditional computer systems do completely the wrong thing, and
either prevent the form from being committed with what is known
(phsyical description=xxxx, presenting complaint=partially severed left
hand....) or creates dummy (but wrong) values for the fields that could
not be filled in.
For this kind of situation, we have taken a lead from SCADA control
systems (where I learned about software) and HL7's "flavours of null"
approach. In control systems, all values have an associated "data
quality" marker, which, if it indicates that the value is "old" or that
serial communication from the field has stopped, you ignore the actual
value (which might otherwise look like a completely legitimate
transformer voltage or whatever). In HL7, all their data types include
the notion of Null values in every possible field, and the include a
"flavour of null" - reason for why the value is not available - e.g..
"unknown", "unavailable", "not asked", "asked but refused", "not
applicable" etc (that's from memory so the values might be a bit off).
The approach we have taken in openEHR is similar to the control system
approach, and uses HL7's flavour's of null. Thus, the class ELEMENT has
attributes:
value: DATA_VALUE
null_flavour: DV_CODED_TEXT {value from HL7 null flavours domain}
This approach also works for database systems - there is no need to mix
in fake null/0 values into the type value domain for a value field -
it's a separate field, btu always associateed with the value field. So
even if Oracle forces you to have a real date in the date-of-borth field
(e.g. "1-1-1800"), the null_flavour sitting next to it has the value
"UNK", meaning - "unknown - ignore what is in the value field".
>> 2. Imprecise. (eg. age "between 25 and 30" etc.). This arises from
>> a lack of granularity.
>
we definitely have to deal with this. The possible ways include:
- DV_INTERVAL<T> type for ranges
- partial dates & times
- using narrative text
do we need more?
>> 3. Vague. (eg. blood pressure "high", smokes "a lot", pain "acute",
>> etc.) This arises from the use of fuzzy terms.
>
we also have to deal with this, and the typical clinical version found
in pathology and other areas where you get values from sets like {trace,
+, ++, +++, ...}.
Currenty we have avoided a complex fuzzy data type, and provided the
DV_ORDINAL data type, which allows ordinal numbers to be associated with
symbols (or words). So for smoking, if you really want to avoid
characterising quantitatively, you could use a DV_ORDINAL, which comes
from a "Lilliputian DOH tobacco consumption" domain/set: {1=none;
2=occasional; 3= regular/light; 4=heavy; 5=going to die real soon now}.
From the medical perspective I imagine that this particular example
would be a spectacularly bad way to record this particular datum.....
but the model will certainly let you do it, and it will also allow
comparison (use of the '<' operator) by virtue of the ordinal numbers
associated with the symbols. For recording pain, or the Apgar
characteristics, or urinalysis values, this approach seems fairly common
among clinicians.
Our idea wsith DV_ORDINAL was primarily not to prevent doctors from
using "+", "++", "+++" type values, and to add a little bit of rigour
(ensuring comparability).
What we are not doing is implementing a mathematical fuzzy model where
each symbol is associated with a sub-section of a numerical range. For
those of you into fuzzy maths, you know that to characterise these
mapping requires a fair bit of extra information. However, this kind of
information can be stored in archetypes, and is not needed in the data
(the mappings should not change with respect to the patient), so we
should probably consider this when designing the archetype version of
the DV_ORDINAL class (and maybe other quantitative classes as well).
>> 4. Uncertain. (eg. a 95% chance of accuracy). Arises from a lack
>> of knowledge or subjective assessment.
>
for this we include a "confidence: REAL" attribute in the ENTRY class.
>> 5. Inconsistent. (ie. contradictory information).
>
I'm not sure what should be done about this, but I think it is in the
clincal domain; the level of or reason for inconsistency should be
characterised in the data by its authors; I don't think it needs anyting
special in the reference model. (Anyone disagree?)
>> to that you can add a sixth
>
>> 6. Out-of-date. (ie. correct when stored by unlikely to be true now).
>
this is a tricky one, and an example is "smoking status"=smoker which
might be true up until two years ago, but change then. Also, the
converse - the EHR shows that the patient was recorded as a smoker 15
years ago, but there is no new information regarding smoking at all. Is
s/he still a smoker? In general the time-based transaction concept of
GEHR gives systems the basic tool for recording updates to things.
Sam has been contemplating ways of representiing the idea of
"confirming" previous information whose value does not change, but we
want a more recent update on teh situation (and medico-legally, the
practitioner wants to show in the record that they did indeed review
various things on such-and-such a date). This might require a special
marker whcih does not change the valuue of something, but says that it
was verified to be the same. I don't think we have and answer yet for
this in the architecture.
>> These can, of course, be combined!
>
>> Incompleteness has traditionally been handled in databases with the
>> null value. In my opinion this has been totally inadequate but that
>> doesn't stop it being the only option available in most systems.
>> Imprecision and uncertainly is often handled through coercion to the
>> nearest value with all the problems that might cause and vagueness
>> and inconsistency is often not handled at all. Out-of-date-ness is
>> handled by assuming it doesn't happen.
>
John's long experience with the horrors of inadequate data handling
certainly rings true with me.
>> For the purposes of GEHR, I would suggest that No. 5. Inconsistent
>> data is a fact of life and since this is somewhat different (it
>> required two pieces of information for example) then we should leave
>> this category to constraint handling and expert interpretation.
>
Agree.
>> However, I would suggest we need to find a way of handling the other
>> 5. It's not initially clear how though. Perhaps a qualifying field
>> for each critical value?
>
how do you feel about the current ways of dealing with the problems,
detailed above? We would value your expert opinion.
- thomas beale
-
If you have any questions about using this list,
please send a message to d.lloyd at openehr.org