Polishing node identifier (at-codes) use cases.

Thomas Beale Mon, 23 Sep 2013 13:21:55 +0100

On 23/09/2013 11:47, Bert Verhees wrote:
> On 09/23/2013 10:38 AM, Thomas Beale wrote:
>> On 20/09/2013 20:40, Bert Verhees wrote:
>>> Op 20-9-2013 17:01, Thomas Beale schreef:
>>>> it's simpler than you think - we made that property mandatory so 
>>>> that programmers would never get a null exception.
>>> Must have been along time ago, nowerdays, programmers have no 
>>> problem handling a null property.
>>
>> actually, that's not quite true. It's probably the primary reason for 
>> exceptions in object-oriented software - method call on a void 
>> object. But I get what you are saying, and for this String field, 
>> being null would not pose a great problem. So we could change the 
>> spec to do that.
>
> Yes, it is very easy to catch a null-exception and then do something 
> with that information. Anyway, IMHO, specs should not solve technical 
> problems, and they mostly don't do that. I believe this is also 
> defined in UML.
>
> Technical problems are for implementers to solve.


Hi Bert,

I don't happen to believe in that philosophy. Here's why: if you leave 
too much open, for implementers to constantly decide, then the 1,000 
people (let's say) who download your specification will solve those 
problems individually. Some may talk on lists, but essentially (knowing 
developers as I do) they will mostly solve it on their own. Let's say 
each of those people takes average 2 hours to decide and test a solution 
for a given problem. That's 2,000 hours gone. Many of these solutions 
will be different, and many will have bugs or even be wrong. Let's say 
30% are buggy / wrong. Let's say there is 10 hours average remedial time 
to fix each of these problems. That's 333 x 10 = another 3,330 hours gone.

That's 5,330 hours, or over 2 person years. It clearly makes sense to 
spend 10, 20 or even 50 hours centrally to find a definitive answer once 
and publish that, rather than waste 2.5 person years at the periphery, 
creating low-grade chaos!

In addition, some let's say 1% of the original - that's 10 
implementations - have not only bugs, but bugs that cause patient harm 
or economic damage (e.g. wrong query results, downtime etc). Who knows 
what the cost of that will be.

Worse than all of this is the fact that many of the 1,000 solutions to 
the problem will be different, perhaps 100 flavours. That means we have 
100 flavours of solution to just that one tiny issue in the original 
specifications. On its own, that's a virtual guarantee that those 
solutions will not work interoperably without some small adjustment or 
remediation. The correction is probably small. However, if there are 100 
similar decisions / issues in the specifications we are talking about a 
combinatorial explosion of millions of variants of what should be the 
same software component (or at least the same one within each 
programming language / technology), and that is a huge interoperability 
problem.

My belief is that ambiguity is the enemy of good software and 
interoperability, and of efficiency in development.

For that reason I believe specifications should very carefully specify 
things. I'll give a very simple example. The openEHR specifications 
routinely specify which properties of a class are mandatory, optional, 
and which String fields have to be non-empty. Even those simple things 
help save time.

Now, the actual openEHR specs of course have some errors, and wrong 
decisions. The original specs that most people use today (but are about 
to be revised) probably have some wrong decisions made by me, as a best 
guess at the time of the best way to limit ambiguity.

So what is really needed is for the communities around each development 
technology to build up common reference software components that become 
the one true way (for today) of doing X in Java, or Y in Python. If 
developers start saying 'X is a strange decision', and upon analysis, 
there is a better way to do X with no impact on data, quality, 
performance etc, we should do it. That's how we should progress.

But I don't believe in 'leave it to the programmers' because I don't 
believe in 'programming', I only believe in 'design', carried out at 
different levels of granularity.

>
> That is why this is a strange decision.
>
>>
>>>
>>> I wonder what the idea behind stuffing the archetype_id in the 
>>> archetype_node_id property is?
>>> Here you make it harder for programmers because the archetype_id has 
>>> another syntax in archetype-paths then the archetype_node_id has, 
>>> and anyway, lots of other functions, and a programmer has to check 
>>> the string-layout to find out if it is an archetype_id or an 
>>> archetype_node_id. It also blocks the possibility to store the 
>>> "at"-code for the root, and check the ontology for its contents.
>>
>> the idea is that there is only one field to look at to find archetype 
>> identifying information in data. It is either an archetype_id (string 
>> form) or an at-code, or (for systems that support it) it's empty / 
>> 'unknown' (which could be replaced by null/void). With the archetype 
>> id, you can always look up the archetype and find out the root code 
>> (at0000, or a matching pattern like at0000.1 or at0000.1.1). But if 
>> you can't look up the archetype, you are lost, and that's what the 
>> archetype_id is for.
>
> The point is, the archetype_id is stored in the property 
> archetype_node_id, Pablo implemented it like that in XML, and he found 
> in the specs it should be that way. I think this is an unneeded 
> complication of the specs. Better was to assign a special property for 
> the archetype_id, besides the archetype_node_id.

Well we thought about that a long time ago, and the view was that then 
you will have two fields in every LOCATABLE, one of which (hopefully) is 
null/void in each actual instance. This could easily lead to errors, and 
wastes a data property.

>
> He found this spec in common.pdf, section 3.1.2 where is stated:
> "The archetype_node_id is the standardised semantic code for a node 
> and comes
> from the corresponding node in the archetype used to create the data. 
> The only exception is at archetype
> root points in data, where archetype_node_id carries the archetype 
> identifier in string form rather
> than an interior node id from an archetype."
>
> This makes it difficult to implement, because, an implementer has to 
> test if the archetype_node_id contains an at-code or an archetype_id. 
> This can lead to ambiguities, for example if XML contains the 
> archetype-slots and the connected instances are embedded, which is 
> legal and can really speed up XPath-queries. This possibility 
> ambiguities is special the possible because it is not really hard 
> defined what an at-code looks at.

We certainly need to make sure that the pathing in the XML expression of 
the specifications works as it should. I'm not sure if I understand your 
last statement though.

- thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/attachments/20130923/397314d0/attachment-0001.html>

Polishing node identifier (at-codes) use cases.

Reply via email to