Cyclic datatypes: OpenEHR virus

pablo pazos Tue, 13 May 2014 22:04:56 -0300

Hi Bert,

why the validator should need to continue traversing the instance?

Hi Pablo, because in the attributes are often also complex OpenEhr datatypes, 
so the validator needs to check these complex data types in the attributes too, 
and those datatypes again can have complex datatypes. In case of this example: 
Dv_Text matches {*} you'll need to check everything, every structure, until you 
reach the leaf nodes, which, in this example can be anything. Only then, you 
can be sure that the data set is OpenEhr compliant.

That was my point :) The validation that needs to reach leaf nodes is not the 
archetype validation, but the IM structure validation. That has nothing to do 
with the open constraint {*} in the archetype. In fact, that validation can be 
done completely without considering the archetype. What I said about using the 
XSD is just one way of implementation, you can do that by code also. 
The thing is that a DvText can have the attribute: mappings and then can find a 
the attribute: purpose, of type DvCodedText, which again can have an attribute: 
mappings, which can again have an attribute: purpose, etc.

I got it ;)
So, the occurrence of the leafnode can be far away, and still be compliant with 
the statement: DvText matches {*}, and a 100% compliant validator will need to 
follow al these steps. Of course this is not a normal situation, but it can 
happen. As said, we cannot always control incoming data sets. There maybe buggy 
software in the ecosystem where a kernel runs.

That really depends on implementation. Let say the system doesn't control the 
input, so you can receive anything, for example binary data where you expect a 
dv_quantity. In that case, what I proposed implicitly is to have a 2 phase 
validator, 1st syntactic (against the IM, yes we need to reach leaf nodes 
here!), 2nd semantic (IMO we can prune the validator if we reach stuff like 
{*}). If the 1st phase returns invalid, there's no need to execute the 2nd. If 
you execute the second, you'll never reach an infinite recursion because of 
pruning.
Sorry, maybe I can't explain myself clearly, is difficult to show the on email. 
Maybe others can validate or deny this.
To be safe and with feasibility in mind, a validator would need to stop 
validating, at some arbitrary point, although there is no error. So a validator 
which follow the rules for 100% is dangerous! it can crash a system.

Having two phase validators, I don't know if there's any case that you didn't 
cover 100% and might get valid from invalid data or cover 100% and end with 
stack overflow. Finding a counter case would be enough to invalid my proposal :)
That was my point.
You are right in your statement, that when a part of an archetype is 
wildcarded, the XSD is the place where to find the validation rules.
Maybe the problem is trying to validate against the archetype at first and then 
validate the IM. I think it should be IM 1st and AM 2nd. But of course, I may 
overlooked some pathological case and this might not work on 100% of the cases.
Another thing that might be helpful is not to use archetypes directly, use 
OPTs. I learned that in the hard way. OPTs can contain the whole structure and 
constraints of specific compositions. So if someone specifies DV_TEXT in the 
OPT, my interpretation is they don't need a DV_CODED_TEXT there. Also, an OPT 
is all in one file, while with archetypes you have to deal with slots 
(argghhhh). In fact, right now I'm changing all my systems adding OPT support. 
Simpler to validate, simpler to query.
Cheers,Pablo. 

Best regardsBert

Op dinsdag 13 mei 2014 heeft pablo pazos <pazospablo at hotmail.com> het 
volgende geschreven:

Hi Bert, I'll clarify because what you interpreted is not what I tried to say, 
but we're on the same page.

> Date: Tue, 13 May 2014 08:47:35 +0200
> From: bert.verhees at rosa.nl

> To: openehr-technical at lists.openehr.org
> Subject: Re: Cyclic datatypes: OpenEHR virus
> 

> On 13-05-14 07:22, Ing. Pablo Pazos wrote:
> > If the value is not constrained, the validator should return true without 
> > continuing checking in cascade-recursive mode. For this to work as 
> > expected, the data structure should be validated before than the data 
> > validation. The easiest way of validating the structure is serializing the 
> > instance to XML and using XSD.

> 
> That is the problem, I do not agree, it has to check in cascade because 
> there can be required properties left out, or fantasized properties 
> which make no sense put in. Every occurring class in a dataset needs, in 

> my opinion, to be validated, if there are no constraints, against the 
> Reference Model-rules.
What I meant with "structure validation" is to validate against the IM (i.e. 
syntactic validation), when I say "data validation" I mean to validate against 
archetypes (i.e. semantic validation).

If the constraint over a node is "not constrained at all", so there are not 
required values defined by the archetype, but, there might be required values 
defined by the information model.

> 
> By the way, you cannot validate OpenEHR datasets against an archetype by 
> using XSD. You cannot create XSD's according archetype-constraints, not 
> even by hand. I have been there, a few years ago.

>
The information model can be validated with the XSDs, because the XSDs define 
the IM constraints.
The XSDs are not to validate against archetypes (totally agree with you), is 
the IM validation that validates the structure and some required fields (by the 
IM!).

Once you receive a well formed structure (should be valid against the IM) you 
can validate it against archetypes.
If you already checked the instance against the IM and is valid, you'll have 
all the required values (required by the IM), then when validating data (this 
is the archetype validation!) and you find a {*} constraint, why the validator 
should need to continue traversing the instance?

Hope that helps (or at least make sense :)
Kind regards,Pablo.

> Best regards
> Bert
> 
> _______________________________________________

> openEHR-technical mailing list
> openEHR-technical at lists.openehr.org
> http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org

-- 
This e-mail message is intended exclusively for the addressee(s). 
Please inform us immediately if you are not the addressee.

_______________________________________________
openEHR-technical mailing list
openEHR-technical at lists.openehr.org
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20140513/5573e552/attachment-0001.html>

Cyclic datatypes: OpenEHR virus

Reply via email to