Polishing node identifier (at-codes) use cases.

Thomas Beale Tue, 24 Sep 2013 18:54:08 +0100

On 24/09/2013 00:10, Bert Verhees wrote:
>
>>
>> For that reason I believe specifications should very carefully 
>> specify things. I'll give a very simple example. The openEHR 
>> specifications routinely specify which properties of a class are 
>> mandatory, optional, and which String fields have to be non-empty. 
>> Even those simple things help save time.
>
> What time do you save? Allowing developers to write sloppy code 
> because they don't need to check for a null-value?
> Do you think that professional programmers are not able to apply basic 
> programming rules, to check for a null value when retrieving data from 
> a database or external source?
>
> I don't know which quality of software-development you expected in the 
> OpenEHR community when writing this spec, but it does not seem that 
> you had much confidence in developers, at that time.


it's not developers like you or many of the other careful, thoughtful 
and professional people on these lists. But there are huge numbers of 
developers out there whose main job is implementing something else, but 
who have to quickly 'put something together' for this or that project, 
typically in a department of health, hospital or other provider site. 
These people have to write code in a rushed way, and will inevitably 
solve things as fast as possible without deep contemplation. And yet - 
those pieces of software routinely end up in real health data processing 
environments. So the aim of the specs is to reduce errors by this kind 
of development.

Like I said, particular choices in the specs to achieve that might be 
wrong, and the community here needs to help improve that.

>
>>
>> Now, the actual openEHR specs of course have some errors, and wrong 
>> decisions. The original specs that most people use today (but are 
>> about to be revised) probably have some wrong decisions made by me, 
>> as a best guess at the time of the best way to limit ambiguity.
>>
>> So what is really needed is for the communities around each 
>> development technology to build up common reference software 
>> components that become the one true way (for today) of doing X in 
>> Java, or Y in Python. If developers start saying 'X is a strange 
>> decision', and upon analysis, there is a better way to do X with no 
>> impact on data, quality, performance etc, we should do it. That's how 
>> we should progress.
>>
>> But I don't believe in 'leave it to the programmers' because I don't 
>> believe in 'programming', I only believe in 'design', carried out at 
>> different levels of granularity.
>
> It is inefficient to have an empty string instead of a null value, it 
> is a waste of processor-time. Now, programmers must check for the 
> contents of a string, if it is empty then it must be considered null.

I agree that's usually true. However sometimes there are reasons to 
never want a null field, which guarantees that software will always deal 
with the value safely rather than crashing unexpectedly. Occasionally it 
makes sense to do the same with Lists - ensure there is one, even if 
occasionally empty.

> Checking for a null-string (which does not exist in memory) is much 
> more efficient. No String calculations needed, no object creation, etc.
> It is basic code-optimization, never instantiate a variable if you 
> want it to be null. Your specs force software to be unnecessary 
> inefficient.

like I said, in this case, it might make sense to change the spec.

>
> You are taking responsibility for errors bad or unexperienced 
> programmers could eventually make.
> It shows disdain for most developers. Ivory tower we call that in the 
> Netherlands.

you have to realise that specification authors (should) try to minimise 
ambiguity and therefore possible errors for all users of a standard. The 
unfortunate reality is that everyone programs these days, and many 
people (who might be surgeons or senior administrators!) do part-time 
programming, but probably not very well. That's the world today...


>
>> Well we thought about that a long time ago, and the view was that 
>> then you will have two fields in every LOCATABLE, one of which 
>> (hopefully) is null/void in each actual instance. This could easily 
>> lead to errors, and wastes a data property.
>
> I don't see any errors for having different properties for different 
> things.
> I see errors in having different things in the same property.
>
> A waste of a data-property?
>
> I do not understand what you are trying to say.  Do you mean that 
> there are occasions in which a specific property is useless?
> Because it is not used? Then I must say that OpenEHR has a lot of 
> waste, because there are many properties which are not used all the time.
> :)

sure - if you have a separate property to store the archetype id, it is 
empty in 95% of all object instances, and also you need a class 
invariant to prevent it being filled at the same time as the 
archetype_node_id (at-code) property.

If it's a single property, it always contains the archetype 'node id', 
which is either an internal node if (at-code) or an archetype root node 
id (the archetype id itself). I think that's pretty clear.

>
> Why is that a waste? Because of database-space?
>
> Maybe it is this: It must be because you don't want null-values and 
> want to put empty strings in the place.

not generally. You'll see that most string fields in openEHR that are 
optional can be null; most that are mandatory can't be empty, just as 
you said above.

> That is indeed a waste, I explained above, it is a waste of memory, 
> processor-time, database-usage.
> There, in that design-part, you justify a waste.
>
> Maybe it is time to give some responsibility of software-development  
> to software-developers and stop thinking about decisions as
> - using one property for two different things
> - using empty-strings to indicate a null value
>
> This is the big-data-society in which programmers are educated in 
> their profession. You should trust them more then you do now.

professional developers (over a certain age;-) may be. Numerous others 
who nevertheless build software are not. We need experienced 
professionals to help improve the specs.


> As you say, you thought about this a long time ago. That was also my 
> thought about this, and it would be good to change this.
>
>>
>>>
>>> He found this spec in common.pdf, section 3.1.2 where is stated:
>>> "The archetype_node_id is the standardised semantic code for a node 
>>> and comes
>>> from the corresponding node in the archetype used to create the 
>>> data. The only exception is at archetype
>>> root points in data, where archetype_node_id carries the archetype 
>>> identifier in string form rather
>>> than an interior node id from an archetype."
>>>
>>> This makes it difficult to implement, because, an implementer has to 
>>> test if the archetype_node_id contains an at-code or an 
>>> archetype_id. This can lead to ambiguities, for example if XML 
>>> contains the archetype-slots and the connected instances are 
>>> embedded, which is legal and can really speed up XPath-queries. This 
>>> possibility ambiguities is special the possible because it is not 
>>> really hard defined what an at-code looks at.
>>
>> We certainly need to make sure that the pathing in the XML expression 
>> of the specifications works as it should. I'm not sure if I 
>> understand your last statement though.
>
> Imagine an archetype-slot, for example, for having contacts in a PERSON.
> There are two ways of implementing it in object-instances or 
> XML-instances.

by XML-instances, do you mean 'by reference'?

> One way is:
> Having different instances, connected via a not in the specs defined 
> connection indicating that one instance should be placed inside the 
> property of another instance.
> Talking about errors, here is a situation in which the specs fail to 
> indicate how the connection must be made, and it is left to implementors.
>
> Seeing that the spec fail to specify this (and the specs want to 
> protect us against simple programming-errors), we must conclude that 
> the specs want us to really implement archetype-slotted instances to 
> be a materialized part of the containing instance.

If you are referring to what the data instance structure looks like, yes 
if the reference model says it is inline (i.e. included by value) then 
that's what it is. The corresponding archetype structure technically 
could be made of multiple archetypes, connected by slots, or by one 
large archetype acting as a template.

>
> I think this is a wise thing to do. Because, what do you want to do 
> with data?
> You want to query them, and do this as efficient as possible. You want 
> database-indexes to be used to find values for ADL-paths (which are 
> easily translated to object-instance-paths or XPaths)
> The whole OpenEHR ecosystem is build around ADL-paths: AQL, templates, 
> etc.
>
> Imagine you write a query which retrieves for you a PERSON (as an 
> object-instance or an XML-instance, or another instanced way), and in 
> that person are paths, ADL paths.
>
> Two difficulties arise:
> One:
> Now you write software to analyze that PERSON, and you see the 
> "contact"-property, and you don't know at that moment if that contact 
> is included via slots, or is included via a large PERSON-archetype.
> So in that case, you need to analyze the contents of archetype-node-id 
> of the contacts to detect if it is an archetype_id in it or an at-code.
> This is very hard, and maybe impossible to do this trustworthy. So the 
> programmer has to check the archetypes to check this.

well to check in the data if you have an archetype id or an at-code, 
it's just going to be something like:

if (archetype_details != null) {
     // archetype_node_id contains an archeytpe id
}
else {
     // archetype_node_id contains an at-code
}

the Common IM spec 
<http://www.openehr.org/releases/trunk/architecture/rm/common_im.pdf> 
says this - see p 22 - invariants:

Archetyped_valid: is_archetype_root xor archetype_details = Void


> This is a big waste, unnecessary. A waste of a lot of processor-time, 
> thousands lines of code are involved to read the archetype and check 
> if a string in the "contact"-property is an at-code or an archetype_id.

I think it's only one line of code, as above.

If you want to check whether an archetype has a slot at the 'contact' 
path then that's easy as well, with something like:

if (my_archetype.definition.c_object_at_path (path_to_contact_property) 
instanceOf ArchetypeSlot))
         // it's a slot
else
     // something else


>
> Two:
> Imagine writing a AQL-engine on a database. As we know, the syntax for 
> an archetype_id is completely different from the syntax for an 
> archetype_node_id. But the writer of the engine needs to find these 
> completely different things in one property, with no indication which 
> is what, especially in slotted-instance-sets. I think that you can see 
> how difficult that is, he needs, as in the previous problem, to check 
> archetypes to know if the contents of that property is an 
> archetype_id, and interpret/create the ADL-path accordingly.

I'm not sure where the difficulty lies. I don't believe any of the 
implementations of AQL have had any great difficulties in this area. 
Whatever path is provided in a query, the AQL engine just looks for it. 
It can easily do this in quite a dumb way.

I can imagine that one day in the future we use Snomed-like codes for 
both at-code and archetype id, which would mean it's the same kind of 
code always in the property archetype_node_id in a Locatable, but that 
wouldn't make a lot of difference.

>
> This is not a wild example, we all need to create AQL-engines, to use 
> the OpenEHR ecosystem as meant in the specs. It is not very hard to 
> do, because, ADL is very similar to XPath, and I think that 
> object-database, also have object-path-queries. So it is easy to 
> translate, but we still need to do that, and create/interpret ADL-paths.
>
> The situation you have created, as you state, to avoid errors is 
> causing errors or unnecessary difficulties and causing thousands of 
> lines of code to be used (wasted processortime).
>
> I hope you agree that this is an error and I hope that you will take 
> care that these two things (the other also in this email) will be 
> changed in the specs.

I'm still not that clear on what the problem really is here. Yes, the 
archetype_node_id field can contain two different types of value, but 
they're easy to tell apart.

I may just be missing the point here, so feel free to elaborate. Also, 
if other developers have had problems with this, post your experiences.

- thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20130924/03517dba/attachment-0001.html>

Polishing node identifier (at-codes) use cases.

Reply via email to