hi Thomas

> is it your feeling that we need to have a better model of accuracy, i.e. 
> more like the confidence interval idea? Or are we ok with what we have? 

well. a measured quantity is a group of data, with some or all of the following
things known:
- what was measured
- how it was measured (+ who & when & where & environment conditions!)
- units for the values
- a possible range of values

What was measured does usually need to be known. In terms of modeling 
this as a data type, I note that what was measured is usually considered 
to be outside the data type. And though I wonder whether this is actually 
right, we're not discussing this right now. 

In real life, we generally don't count how it was measured as part of the 
value.  The idea is that you go read the "methods section" (whatever that 
means) if you care. Except that in clinical medicine, there's a few things 
where how something is measured matters. in clinical medicine, we generally 
say that something else was measured at this point. A classic example 
is Total Calcium and Ionized Calcium. (and it's not wrong to call them
different things, my point is that it's arbitrary). Anyhow, I've never 
heard someone argue that they should be part of the datatype. 

I don't think there's much point in differentiating between a measured
and a non-measured quantity - that's for philosophy.

so, back to the possible range of values. This is a complex concept. 
Generally, the possible range of values is a bell shaped probability 
distribution (or a log bell curve), but it's rarely properly known 
whether it actually is - it's generally assumed that it is a bell curve. 
You could *approximate* the concept of a probability distribution by
reporting a central value with a +/-, or an interval that expresses the
95% percentile. 

I know that we got taught in uni to track uncertainties (and sometimes 
even to quantitate the distribution curve), and to bring them through 
our equations (and conclusions!), but out in the real world, it's rarely 
done in published papers (shame, really) and I've never seen it done in 
clinical work (even in clinical research).

In clinical medicine, the only behaviour I've seen is to report a 
single value, what was actually measured, and not say anything at 
all about the uncertainty. No, I'm wrong. Once I used to perform an 
assay where the methodological uncertainty in the number was 
clinically significant. We used to report a range rather than a point 
value, so's the doctor's couldn't be mistaken about it's meaning. 

Reporting <X or >X for a value is something that you have to do if you
aren't normally reporting a range of values. So you said you didn't want
to model that as an interval, but I was less than convinced - if you always
reported an interval, it would be consistent. But even if you were consistent
in this way, the methodological basis for the "interval" <5 or >5000 is not
the same as the methodological basis for 100-110. These concepts overlap.

If you added confidence interval - as an optional item - then you get an 
interesting situation. If I say that this value is <50 (ci=100), what 
am I saying? (and don't laugh, this is a common clinical result value to 
report).

In clinical medicine, also, the things that may corrupt the result due to 
interference from drugs, unusual medical conditions, etc, these don't 
contribute to the distribution range, so it's not usually significant.

This is starting to ramble. As I said, in clinical medicine, we only report 
a single value, let the interpreter figure out the distribution themselves. 
If they're not sure, they should contact the number on the report (in all
legal jurisdictions I know, there must be one).

I think that for the rare cases where the distribution range needs to be 
conveyed/stored outside the generating system, then the archetype should
store it. The archetype already includes some of the other stuff in my 
original data grouping, so I don't see it as inappropriate to solve it 
this way.

so, leave it as it is.

Grahame


Reply via email to