Polishing node identifier (at-codes) use cases.

Thomas Beale Wed, 25 Sep 2013 21:47:52 +0100

On 25/09/2013 00:53, Bert Verhees wrote:
>
>> sure - if you have a separate property to store the archetype id, it 
>> is empty in 95% of all object instances, and also you need a class 
>> invariant to prevent it being filled at the same time as the 
>> archetype_node_id (at-code) property.
>
> I must disagree, it is very common in archetypes, I think it is in 90% 
> of the archetypes that the root of a definition also has a node_id.


It's 100% ;-)

But what i meant was that in any instance structure, say a Composition, 
most of the nodes in the data tree will have an at-code in 
archetype_node_id, only a few - the archetype root points - will have 
archetype ids.

The at-node corresponding to the root point is just the at0000 code (or 
a specialised version of that). Putting that in the data is not much use.

> So in that case both can occur simultaneously. But in the path only 
> the archetype_id will occur, and it is easier for a programmer to find 
> which one is the archetype_id if it is in a separate property.

well there is already a property - archteype_details - for that purpose. 
Originally we did think of putting the at0000 codes in 
archetype_node_id, and putting the archetype id in the archetype_details 
property, which is a separate object. (Somee) developers found it was 
too annoying to use like that, so we changed to the current way of using 
the properties.

>
> And anyway, I don't think a seldom used property is a waste. It is 
> only bits and bytes, and there is hardly any code involved having this 
> property. But as I showed in example, not having this property can 
> make many thousands of lines code-execution necessary. That is a waste.
>
> We, system-builders, and special system-designers like you, do not 
> decide which archetypes are going to be used.
> There are archetypes of megabytes, they exist. I don't think it is 
> wise to have them, but it is that modeling is not always focused on 
> performance, but more on academical medical ideas.
> We, builders of two level modeling systems, we must be able to live 
> with this kind of academic exercises.
>
> But those archetypes cost one second ore more, just parsing on a 
> medium speed computer.
> You don't want to do this unnecessary, you don't want to parse that 
> kind of archetypes at every data-entry. It breaks your system.

well you wouldn't be parsing archetypes at data entry - they have to be 
pre-parsed, validated, and used to generate Operational Templates (OPTs) 
which are the final XML structures stored in the server.

>
> Because there is no sure way of analyzing a string and find out if it 
> is an archetype_node_id or an archetype_id in slotted situations 
> besides parsing and analyzing the archetype, this will make the 
> situation of having one property for two different values inefficient, 
> and in some situations dramatic inefficient.

as I said in he previous post, you can just check if archetype_details 
!= null

>>>
>>
>> If you are referring to what the data instance structure looks like, 
>> yes if the reference model says it is inline (i.e. included by value) 
>> then that's what it is. The corresponding archetype structure 
>> technically could be made of multiple archetypes, connected by slots, 
>> or by one large archetype acting as a template.
>
> The idea of what I was saying, I think I can express it more clear 
> now, is that there are two ways of embedding a slotted dataset (based 
> on an archetype which fits in the slot) in the containing dataset 
> (based on the archetype which has the slot, so to say, the containing 
> archetype)
>
> One way is to add a reference to the container-dataset, which points 
> to the slotted dataset.
> The other way is to add the slotted dataset materialized in the 
> container-dataset.
> (The expression "materialized" is from oracle)
>
> The first one is not described in the specs, so to say, there is no 
> spec which indicates how to reference the datasets.
> In theory the specs expect the second situation. The paths in AQL or 
> templates are defined if the slotted datasets are materialized inside 
> the containing dataset.
> This is also the most simple way to do this.

correct

>
> This causes, however, a problem.
>
> Imagine you have a dataset and you want to express a path to a 
> leaf-value.
> You must know in that case if there are slotted datasets in it, 
> because the path will follow other syntax rules in case of slots.
>
> So in a PERSON without slots a contact would look like this
>
> [person-archetype]/contacts[at0003]/items[at0004].............
>
> In a PERSON with slots it would look like this.
> [person-archetype]/contacts[at0003]/[contact_archetypeId]/items[at0004].............
>
> So if you have a large dataset and you want to express ADL-paths to 
> leaf-nodes, you need to know if there are slots.
> There is one way to find out. Parse the according archetype and find 
> out if there are slots.

well the more obvious way to find out is to parse the OPT and just get 
its path set - that is what you can query with.

>
>> well to check in the data if you have an archetype id or an at-code, 
>> it's just going to be something like:
>>
>> if (archetype_details != null) {
>>     // archetype_node_id contains an archeytpe id
>> }
>> else {
>>     // archetype_node_id contains an at-code
>> }
>>
>> the Common IM spec 
>> <http://www.openehr.org/releases/trunk/architecture/rm/common_im.pdf> 
>> says this - see p 22 - invariants:
>>
>> Archetyped_valid: is_archetype_root xor archetype_details = Void
>
> This is indeed a way to handle this, but what bothers me in this case, 
> two things.
> - You cannot have an XPath engine doing this complex querying, it 
> makes path-based queries very complex, and maybe even impossible.

but the Xpath engine doesn't need to do this. It just processes the 
query paths it finds in the queries. It doesn't need to know what 
archetypes were used to structure it.

> - Maybe technical not so important, but the property name does not 
> indicate what it contains, and it is bad programming practice to have 
> misleading names.
>
> I understand that having an archetype_id property creates redundant 
> information, because the information already is in the 
> archetype_details property, but the same also goes for storing the 
> archetype_id in the archetype_node_id. I think this redundancy is 
> ugly, and should not occur.  I think redundancy is a design error. The 
> reason is that the archetype_details contain other information besides 
> the archetype_id.
>
> The best way to do would be a separate archetype_id property, and 
> eventually archetype_details without archetype_id, or find another way 
> for the details, these details are also in archetype itself.

we'll certainly review this, and take the above into account.

>
>>
>>>
>>> Two:
>>> Imagine writing a AQL-engine on a database. As we know, the syntax 
>>> for an archetype_id is completely different from the syntax for an 
>>> archetype_node_id. But the writer of the engine needs to find these 
>>> completely different things in one property, with no indication 
>>> which is what, especially in slotted-instance-sets. I think that you 
>>> can see how difficult that is, he needs, as in the previous problem, 
>>> to check archetypes to know if the contents of that property is an 
>>> archetype_id, and interpret/create the ADL-path accordingly.
>>
>> I'm not sure where the difficulty lies. I don't believe any of the 
>> implementations of AQL have had any great difficulties in this area. 
>> Whatever path is provided in a query, the AQL engine just looks for 
>> it. It can easily do this in quite a dumb way.
>
> I am not sure what it means if there are two different paths possible 
> to one data-leaf. One path with the slot defined, and one path as if 
> there was no slot.
> A few weeks ago we both argued to William Goossens that the path is 
> the identifier for a datapoint, not the archetype_node_id.
> But now you seem to imply that there are more then one 
> paths-definitions possible.

we are talking in templates, which are (generally) made up of a 
composition of archetypes. So it's true that in a system containing 
PERSON objects, some archetyped by a composition of archetypes, and 
others archetyped by template-mimicking 'big' archetypes, then you can 
get more than one path to the same object instance node (say 
PERSON.contacts). But that happens anyway, all the time, simply due to 
the use of diverse archetypes. There might be 20 templates whose paths 
are all different, that point to PERSON.contact objects in the data. 
That's the whole point - those semantically different paths - that's 
what querying runs on.

My suspicion from what you are saying is that you are not doing a 
pre-load of operational templates into your back-end system. If you had 
that, the query service can work very optimally.

- thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20130925/af2d0e06/attachment.html>

Polishing node identifier (at-codes) use cases.

Reply via email to