Snomed ontological solidity (was term bindings in archetypes and templates)

Thomas Beale Thu, 11 Mar 2010 13:03:07 +0000

On 11/03/2010 11:59, Stef Verlinden wrote:
> For those of you interested in the 'problems' within Snomed as an ontology, 
> here (http://precedings.nature.com/documents/3465/version/1) you can find a 
> good and recent article describing them. This doesn't mean we shouldn't use 
> Snomed, but knowing where the problems are is helpful to find solutions as 
> Thomas already stated.
>
>


this is one of the best short papers I have seen on Snomed - I recommend 
everyone read this. I have never had the time to investigate this 
properly, but I made some comments in IHTSDO Tech Committee last last 
year, viz:


~~~~~~~~~~~~~~~~~~~~~~ TB post on IHTSDO Nov 2009 ~~~~~~~~~~~~~~~~~~~~~~


    Context in Information and terminology models

From: Thomas Beale

Date: Wed, 18 Nov 2009 at 6:25pm

Category: Hot Topic 
<https://thecap.basecamphq.com/projects/1384601/cat/13604244/posts>

I have been reading Kent Spackman's "SNOMED expressions and context 
patterns" slides from August, which I saw for the first time after 
Bethesda. I think the pattern-based approach is a welcome advance. 
Coming from the EHR/information model point of view, I was starting to 
develop some ideas on mapping to structural models. However, the more I 
looked at the context model details against a real example, the more 
problems (my misunderstandings?) I have.

I made some initial analysis at http://www.openehr.org/wiki/x/1YJb . See 
from about halfway down. I am struggling with the utility of embedding 
'temporal context' inside coded expressions, and I also have questions 
about 'finding/procedure context' and 'subject relationship context'.

I also read Hanfei Bao's recent document 'A Speculation On Context 
Problems of SNOMED CT', where a definition of 'context' is given. 
Following that there are examples like

"FH: Myocardial infarction" is-not-a "Myocardial infarction"

My question here would be what does the latter term actually mean, if FH 
of MI is-not-a MI? That means that "Myocardial infarction" is assumed to 
be some specific kind of MI that doesn't include MIs of family members. 
If this is true then the meanings of 'naked' terms like "Myocardial 
infarction" are not what we expect (in this case: the phenomenon of a 
kind of heart attack, regardless of context). The mere fact that "FH: 
Myocardial infarction" includes the term "Myocardial infarction" 
indicates that in normal ontological terms it is in fact a kind of MI, 
since otherwise we would be talking about the family history of some 
other phenomenon.

This is an entirely different kind of consideration to "without skull 
fracture" is-not-a "skull fracture", which is a negation.

 From my point of view, aspects of context that we need to address include:

* the IHTSDO definition; given the above, I am not sure it is clear yet;
  * consider developing a context model as a small ontology rather than 
in SNOMED, and on this, base both information models and any SNOMED 
representation of context
* issues to do with how complex post-coordinated codes are going to be 
routinely and safely created in real EHR systems
* issues of performance in real systems, particularly querying

- thomas beale

~~~~~~~~~~~~~~~~~~~~~~~~~ Jereny Rogers reply ~~~~~~~~~~~~~~~~~~~~~~~

Thomas -- FWIW here are my thoughts and ramblings in response to reading 
most (but not yet all) of the WIKI page. Hopefully more as and when, but 
this much at least should stimulate some debate.

*On representation of numerics and ordinals*

Its true that neither SNOMED nor AFAIK any other DL-logic based ontology 
supports reasoning over ordinals, and for that reason there would be 
little advantage in encoding them within 100% ontology expressions -- 
except for the fact that The Sins of The Past mean that the ontology 
already contains a modest number of legacy content concepts that includs 
ordinal value expressions, for example the descendents of 
417597005|Urine dipstick test finding|.

If we were to kick ordinals firmly into the information model, then 
there's an issue of how you would describe the relationship between 
these legacy 100% terminology expressions and some external construct 
that's a combination of a terminology expunged of ordinals and one or 
more information models. On the whole, I could agree with your analysis 
if we were in a green field, but the reality is that earlier work has 
killed off all green lifeforms to the horizon and beyond.

Meanwhile, the boundary with regard to where values should be 
represented (ontology vs information model) is perhaps now somewhat 
blurred because OWL at least /does/ support reasoning over real numbers. 
It might be worth asking the DL folk what the use case was for including 
this reasoning support within the ontology if its was already readily 
available in information models.

*On representing any value set substructure (e.g. 2 knee reflex recordings)*

The current MRCM release says the associated_finding attribute has 1:1 
cardinality. This means that if you want to record a left and a right 
knee reflex finding, then it has to be two separate coded elements 
within one observation (or, two separate observations of one element each).

*On the significance of the order in which values and other attributes 
are captured/stored*

I'm puzzled by what you mean; some examples of what information is to be 
encoded in the ordering would be useful. But in general I'd have thought 
it a profoundly bad idea to encode any important information only 
implicitly through token order.

*On temporal context*

I agree that where any temporal info and associated temporal inference 
is to be managed is a significant problem, especially since so many 
clinical queries hinge on the temporal relationship between events 
and/or states.

One of the reasons for needing temporal 'classifiers' somewhere in the 
overall system is that some clinically significant information 
inevitably has to be entered retrospectively and may therefore have a 
very imprecise time stamp. So you can't rely on time-stamp reasoning 
entirely for all temporal reasoning jobs.

A realistic clinical scenario example would be:

Q. Have you had any major illness or operations?
A. Yes - I lost sight in one eye for a month
Q. When?
A. About 20 years ago; I don't remember exactly when. They did lots of tests.

One solution (that I actually have to follow today in clinical 
information systems that only support date stamping) is to encode the 
above with a fictitious system date stamp of e.g. 1.1.1989. But this is 
clearly a false level of accuracy and can also cause problems if the 
original but correctly time-stamped record later becomes available and 
is then merged into mine.

Having said all that, I'd personally agree that SNOMED may have built 
out from this difficult edge case and constructed something that has at 
least the appearance of a more comprehensive solution for temporal 
relationships, and then conflated this into models for epistemology, 
probability and state transition. But if some data in the EPR *will* be 
accurately time-stamped, then how that interoperates with the resulting 
complex set of SNOMED temporal classifier values remains an open 
question. And then, of course, we also have the 'follows' and 'after' 
semantic links in SNOMED.

All this does of course create a problem for OpenEHR if it has to 
maintain more than one 'context' solution for different 
terminologies/classifications, depending on whether they individually 
offer within themselves solutions for context (temporal or subject of 
record). But SNOMED already has the exact mirror image of the same 
problem: much of its content was developed for and is used today with 
existing somewhat impoverished information models, most of which have no 
model for context.

*On "FH: Myocardial infarction" is-not-a "Myocardial infarction"*

This statement unpacks into the (hopefully) unarguable statement that a 
patient whose mother has had an MI should not be returned in response to 
a query looking for patients who have themselves had an MI.

/If/ both entities 'FH of MI' and 'PPH of MI' (PPH=previous personal 
history) are encoded in the terminology, and /if/ all subtype querying 
functionality is encoded in /and only in/ an 'is-a' hierarchy, /then/ 
the only way you can get the clinically correct answer is if 'FH of MI' 
is-not-a 'PPH of MI'.

Buried in the above are a number of assumptions and conditions, notably 
that sticking 'MI' into the record actually means 'PPH of MI'. But 
ontologically speaking we'd still say that 'H of MI' is-not-a 'MI'. 
There *is* a semantic relationship between the two, but it is 
emphatically not an is-a relationship.


~~~~~~~~~~~~~~~~~~~~~ TB reply ~~~~~~~~~~~~~~~~~~~

Thanks Jeremy,

*Ordinals:*

In principle I think the data type should only be in the information 
model, however, the terms used in the data type would preferably be in 
the terminology (but with freedom for people to create ordinals using 
local/specialist terms where necessary). The key here is that the 
implementation guidance on how to represent ordinal values in a health 
information system -- should be to use an ordinal data type (e.g. 
DvOrdinal in openEHR) not just a naked term, which gives no 
computability (can't determine the < relations).

*Order in information*

It is not so much that order encodes 'hard information' that would be 
lost if the ordering were lost, but in clinical information 
examples/requests we get from in the field workers, order is seen as 
very important to comprehensibility in many places. Order in many kinds 
of notes essentially corresponds to things like a) a chronological train 
of analytical thought of the physician, b) a structural model of 
something, e.g. disease course described as a group of dates like date 
of last occurrence, date initially recognised etc, c) a 'typical' or 
customary way of presenting information, e.g. endoscope findings. 
Throwing out order will make a lot of health professionals really mad.

*Temporal context*

A lot of clinical information is entered after the fact (a majority in 
some places), but in general this doesn't change the accuracy of the 
timing information. I would suggest that the kind of information where 
this is the case is more findings/diagnoses recounted by patients, e.g. 
telling the GP when they were diagnosed with diabetes, or when their 
parent died of a heart attack. But even in these cases, they are 
providing a partial date, e.g. 1990-xx-xx or even '1930 plus or minus 
5y', both of which in health computing can be used like any other date. 
I still don't see how it helps to classify such information as 'in the 
past' -- this is simply a term that has to be additionally recorded, and 
I can't see how it helps computability. It also adds the risk that 
software creates the wrong term; then some other part of the system 
might forget to look at the date, if it saw the (wrong) term 'in the 
future'. In general we just need proper models of date/time, which 
openEHR and HL7 have had for many years. The work in NHS CUI also 
assumes partial dates/times.

On the more strategic question of whether to use SNOMED for representing 
context in 'more impoverished information models' (of which many 
proprietary ones qualify), I would first be interested to know which 
such models actually use terminology at all, beyond ICDx, ICPC etc. In 
my experience so far, it is almost none -- very few in the US (Mayo 
being a post-coding exception). Clem McDonald said 2 years ago that he 
had never seen a SNOMED code in data. I think any perceived 'need' for 
SCT to supply a context model to older private models of health data 
that don't supply their own should be backed up by some evidence. 
Secondly, if there is evidence (and I am not saying there isn't), then 
the correct approach in my view is to have a common underlying ontology 
of context. The representation of context in openEHR is in fact based on 
ontological principles to achieve this. People like Barry Smith would 
say: do it properly, make a self-standing ontological model of context 
in clinical recording, and that is probably what we should do. Trying to 
model context in any comprehensive way in a terminology won't help the 
'impoverished' systems, since SCT can't do any better in terms of 
quantities, dates, times or other non-text values anyway.

*On "FH: Myocardial infarction" is-not-a "Myocardial infarction"*

If the assumption of SNOMED means that the term 'Myocardial Infarction' 
really means 'Previous personal history of Myocardial Infarction', then 
what SNOMED term do we use to represent just 'Myocardial Infarction'? 
For example if I wanted to list the things that patient X might be at 
risk of? Can a patient be at risk of 'Previous Personal History of 
Myocardial Infarction'? How does this compare to the ICD term for the 
same thing? How are we to use SNOMED in a reporting or clinical study 
application if we can't just use the term 'disease X' to mean 'disease 
X' instead of 'personal history of disease X'?

I hope people do not mind me pushing in a few places. I think that if 
SCT is going to be widely used, its foundations must be absolutely rock 
solid, and at the moment, I need some convincing.

thanks for listening

- thomas beale



*
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/mailman/private/openehr-technical_lists.openehr.org/attachments/20100311/1da48cb2/attachment.html>

Snomed ontological solidity (was term bindings in archetypes and templates)

Reply via email to