Tim Benson wrote:
>Sam, I think you have misunderstood me. Human beings love complex patterns, >but computers hate them. Of course you must keep the richness of "the day >before the big storm", but you should not try to put that sort of thing into >a Julian date field. Let people do what they are good for and let us use >computing for what it is good for. The fact is computers do not like >ambiguity. The question is always what do we want to use this info for? Is >it to structure a record in chronological order or what? > >Tim > I think Tim's general considerations are correct (or at least I agree with them ;-) - the reasons to use structured v non-structured data items (or any items for that matter) are: - if you have enough raw data to build the structured item - if the information is to be used in computation I think these principles are correct.. but we do need to understand them. The general design of the openEHR data types follows these principles in that you cannot create any item unless you can provide the required data to the creation routine; i.e. you can only create valid data items, be they quantities, terms or whatever. However, there are times when you don't quite have all the raw data, but a) you have enough to build a reasonable version of a data instance, and b) you want to be able to compute on the instance. Partial dates and times fall into the category, and this is why we have created separate classes of them. If you have year and month only, you cannot create a valid DATE instance, but you can create a valid PARTIAL_DATE instance, which will still satisfy the computational requirements of DATEs (by synthesising reasonable mid-month dates, etc) For data which is really quite unreliable, we suggest that it be recorded as narrative text, as Tim mentioned earlier. Contrast this with the HL7 data type approach where every type and every attribute and function result can be Null indicating it is unknown. The idea of this (according to Gunther) is so that no matter how little you know, you can record it in structured form. We can think of this design approach as a completely fuzzy approach. As an example, you can have a IVL<TS> (interval of time) with unknown low and high values. I have noted that this makes it nearly useless for computation, since you can't even call the contains(a_time:TS) routine - well you can, but you will get back "UNK" (unknown) as a value. I see dangers in this approach: - the specification is more complex, since the semantics have to include the case where each and every attribute might be Null. Complex specifications are more likely to lead to implementation bugs - software will be more complex because it has to be able to handle UNKs - unreliable raw data is being used to create structured data instances whch might be treated by software as being more reliable than they really are - if there is software operating on the data that does not understand the possibility that UNK can be returned from function calls, it is not clear that the data is safely processable I can see the theoretical interest of recording unreliable data in a structured way, even if half of it is missing, but practically I don't think that it is a very useful thing to do, except in exceptional (and common) cases like date & time. Gunther says that people may come back and fill in the missing bits, but in general I think this is quite unlikely - no-one has time. (Exceptions might be partial data gathered in A&E or similar situations). Hence we have opted for a simpler approach: - in general data types are designed in a pure fashion - no general facility for unknown elements - special data types for partial data are specified; the advantage of this is that the semantics of these types are clear - Null markers are recorded, not inside data instances, but where they are used, e.g. in the ELEMENT class in the EHR reference model thoughts? - thomas beale - If you have any questions about using this list, please send a message to d.lloyd at openehr.org

