Hi Bert,
why the validator should need to continue traversing the instance?
Hi Pablo, because in the attributes are often also complex OpenEhr datatypes,
so the validator needs to check these complex data types in the attributes too,
and those datatypes again can have complex datatypes. In case of this example:
Dv_Text matches {*} you'll need to check everything, every structure, until you
reach the leaf nodes, which, in this example can be anything. Only then, you
can be sure that the data set is OpenEhr compliant.
That was my point :) The validation that needs to reach leaf nodes is not the
archetype validation, but the IM structure validation. That has nothing to do
with the open constraint {*} in the archetype. In fact, that validation can be
done completely without considering the archetype. What I said about using the
XSD is just one way of implementation, you can do that by code also.
The thing is that a DvText can have the attribute: mappings and then can find a
the attribute: purpose, of type DvCodedText, which again can have an attribute:
mappings, which can again have an attribute: purpose, etc.
I got it ;)
So, the occurrence of the leafnode can be far away, and still be compliant with
the statement: DvText matches {*}, and a 100% compliant validator will need to
follow al these steps. Of course this is not a normal situation, but it can
happen. As said, we cannot always control incoming data sets. There maybe buggy
software in the ecosystem where a kernel runs.
That really depends on implementation. Let say the system doesn't control the
input, so you can receive anything, for example binary data where you expect a
dv_quantity. In that case, what I proposed implicitly is to have a 2 phase
validator, 1st syntactic (against the IM, yes we need to reach leaf nodes
here!), 2nd semantic (IMO we can prune the validator if we reach stuff like
{*}). If the 1st phase returns invalid, there's no need to execute the 2nd. If
you execute the second, you'll never reach an infinite recursion because of
pruning.
Sorry, maybe I can't explain myself clearly, is difficult to show the on email.
Maybe others can validate or deny this.
To be safe and with feasibility in mind, a validator would need to stop
validating, at some arbitrary point, although there is no error. So a validator
which follow the rules for 100% is dangerous! it can crash a system.
Having two phase validators, I don't know if there's any case that you didn't
cover 100% and might get valid from invalid data or cover 100% and end with
stack overflow. Finding a counter case would be enough to invalid my proposal :)
That was my point.
You are right in your statement, that when a part of an archetype is
wildcarded, the XSD is the place where to find the validation rules.
Maybe the problem is trying to validate against the archetype at first and then
validate the IM. I think it should be IM 1st and AM 2nd. But of course, I may
overlooked some pathological case and this might not work on 100% of the cases.
Another thing that might be helpful is not to use archetypes directly, use
OPTs. I learned that in the hard way. OPTs can contain the whole structure and
constraints of specific compositions. So if someone specifies DV_TEXT in the
OPT, my interpretation is they don't need a DV_CODED_TEXT there. Also, an OPT
is all in one file, while with archetypes you have to deal with slots
(argghhhh). In fact, right now I'm changing all my systems adding OPT support.
Simpler to validate, simpler to query.
Cheers,Pablo.
Best regardsBert
Op dinsdag 13 mei 2014 heeft pablo pazos <pazospablo at hotmail.com> het
volgende geschreven:
Hi Bert, I'll clarify because what you interpreted is not what I tried to say,
but we're on the same page.
> Date: Tue, 13 May 2014 08:47:35 +0200
> From: bert.verhees at rosa.nl
> To: openehr-technical at lists.openehr.org
> Subject: Re: Cyclic datatypes: OpenEHR virus
>
> On 13-05-14 07:22, Ing. Pablo Pazos wrote:
> > If the value is not constrained, the validator should return true without
> > continuing checking in cascade-recursive mode. For this to work as
> > expected, the data structure should be validated before than the data
> > validation. The easiest way of validating the structure is serializing the
> > instance to XML and using XSD.
>
> That is the problem, I do not agree, it has to check in cascade because
> there can be required properties left out, or fantasized properties
> which make no sense put in. Every occurring class in a dataset needs, in
> my opinion, to be validated, if there are no constraints, against the
> Reference Model-rules.
What I meant with "structure validation" is to validate against the IM (i.e.
syntactic validation), when I say "data validation" I mean to validate against
archetypes (i.e. semantic validation).
If the constraint over a node is "not constrained at all", so there are not
required values defined by the archetype, but, there might be required values
defined by the information model.
>
> By the way, you cannot validate OpenEHR datasets against an archetype by
> using XSD. You cannot create XSD's according archetype-constraints, not
> even by hand. I have been there, a few years ago.
>
The information model can be validated with the XSDs, because the XSDs define
the IM constraints.
The XSDs are not to validate against archetypes (totally agree with you), is
the IM validation that validates the structure and some required fields (by the
IM!).
Once you receive a well formed structure (should be valid against the IM) you
can validate it against archetypes.
If you already checked the instance against the IM and is valid, you'll have
all the required values (required by the IM), then when validating data (this
is the archetype validation!) and you find a {*} constraint, why the validator
should need to continue traversing the instance?
Hope that helps (or at least make sense :)
Kind regards,Pablo.
> Best regards
> Bert
>
> _______________________________________________
> openEHR-technical mailing list
> openEHR-technical at lists.openehr.org
> http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org
--
This e-mail message is intended exclusively for the addressee(s).
Please inform us immediately if you are not the addressee.
_______________________________________________
openEHR-technical mailing list
openEHR-technical at lists.openehr.org
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20140513/5573e552/attachment-0001.html>