Re: openEHR draft Expression spec

Thomas Beale Tue, 24 May 2016 07:33:52 -0700


On 19/05/2016 16:20, Pieter Bos wrote:

Hello Thomas,


I had already noticed the expressions part and based my experimental 
implementation on that. This email got quite long, so let’s start with a 
summary:

Summary:
- The current spec is quite similar to XPath. We can keep this even closer by 
referencing to the XPath specification in our specification in more places. It 
allows for tool reuse and resolves ambiguities in the specification.

I'm assuming you think we should reference specific parts of the the W3CXpath spec from specific parts of the openEH Expressions spec? Thatsounds sensible to me. Do you have suggestions? If you report them in aPR, we can incorporate them in the spec.

- Some other problems/questions where found regarding to the spec, including 
grammar ambiguities and how to handle them and a question about node-ids that 
exist in the AOM, but not alway in the RM.

I remember responding to some earlier issues. If there are new issuesnot yet reported, please report them, either here or as PRs.

I have not implemented the full expression language yet, so I might find more, 
for example when I implement functions.

XPath and the relation to the expressions language:

Before i note my issues, I would like to point out I noticed the language is 
very similar to XPath. In fact, you can convert almost all of the expressions 
language to valid XPath 2.0-expressions with some simple steps:

   1.  Split into separate statements. For every statement:
   2.  Replace Apath shorthand notation with xpath: [id1] to 
[@archetype_node_id = ‘id1’], etc.
   3.  Replace symbolic form of operators with the textual form
   4.  Replace for_all … In … … with ‘every $var in /path satisfies …’
   5.  Replaces implies with ‘if … then …’
   6.  Replace exists(expression) with  count(expression) > 0

Then, get an Xpath implementation that works on your reference model, or just 
convert to XML first. Then for every assertion, evaluate the expression to a 
boolean. For every variable declaration, evaluate the expression to the type 
given in the variable declaration and store it under the given name.
Then implement the standard functions and variables. Functions and variables 
are part of standard Xpath, and so is defining your own.

would it make sense to include an Appendix on 'Xpath Adaption' orsimilar, with this logical algorithm included? If you are interested inwriting content for such an appendix, that would be very welcome.

If you do this, you just implemented full assertion support with very little 
effort and code, and very little chance of mistakes!


... for an XML data context of course....

There needs to be an ODINpath and a JSONpath....

(If all you have is xpath 1, the for all and implies require manual handling. 
You might need to do a bit of extra work for some datatypes, especially 
terminology codes)

Having noticed this, i’m strongly in favour of keeping the syntax as close to 
Xpath as possible. This means we can reuse tools. Or, if you have reasons to 
write your own (I do, unfortunately), at least you can validate your 
implementations easily by testing against a known implementation.

even if not, just being able to re-use grammar or grammar design ideasis useful.

So I would argue strongly in favour of keeping the $var syntax, because it is 
the same as the xpath-standard.


OK, that seems like a good reason.

Some constructions in the expressions have a valid reason why they are 
different than Xpath, for example, the shorthand notation for archetype node 
ids really helps. I would say this could include the exists operator, because 
it expresses something that is often needed and expressing it explicitly allows 
for some really nice features in user interfaces.

However, I think this does not apply to the for_all and implies statements. If 
they could be replaced with the corresponding Xpath-syntax, I would think that 
is a good idea.

on the one hand I prefer first order predicate logic operators, becauseabsolutely everyone understands them, and the meaning is universal, buton the other hand I see the value of sticking closer to a well-knownsyntax. I'm inclined to not worry too much about the 'surface syntax' ifit can be absolutely guaranteed that lossless (and easy) conversion canbe performed. But others may have other ideas.

Problems in the specification

Pieter, can you put the issues you raise below in a new PR 'Expressionlanguage issues' or similar?


thanks

- thomas


Here the problems I found in the spec so far:

Multiple-valued paths and type conversion:

   *   The spec does not say how to handle multiple-valued expressions, outside 
for_all statements. We could just follow the xpath-standard
   *   The spec says nothing about type conversion. We could just follow the 
xpath-standard.

Whitespace aware grammar

The current definition of the language needs a whitespace aware grammar. If 
not, the following is ambiguous:

$var:Integer ::= /path/to/value
/path/to/another/value > 3

Because there is no way to see which part of 
/path/to/value/path/to/another/value belongs to the first or second statement 
without considering whitespace in your parser. And that’s fine in a lexer, but 
harder to do in a parser – although still possible. Alternatively, it’s easily 
solved by demarcating your assertions, for example by requiring a ‘;’ after 
every assertion

The same problem happens in a second place:

for_all $var in /path /some/other/path > $var/subpath

This is actually even a bit hard to read for a human, because the space after 
/path is easily overlooked. Both the whitespace-awareness and the human 
readability could be easily solved by replacing for_all with the every .. In … 
satisfies syntax of xpath.

Node ids in archetype/reference model objects

In archetypes, some nodes have node ids, that have no node id in the 
corresponding reference model object. This is tricky, because a valid path to 
an archetype node, converted to Xpath, is NOT a valid path to the corresponding 
reference model objects. For example, the context attribute of a Composition is 
an EVENT_CONTEXT. This does not have an archetype node id. But it always has 
one in the ADL/AOM. So if you write the path /context[id2], you can convert it 
to Xpath as /composition/context[@archetype_node_id = ‘id2’]. But this will 
result in an empty node set, because there is no matching attribute called 
archetype_node_id. Instead, you could just write /context, which works.

So, there are several options to address this in the specification, for example:

   1.  Specify that paths to non-locatables should NOT have a [idx] predicate, 
even though the id in the archetype is present
   2.  Specify that paths to non-locatables can have a [idx] predicate, but it 
should be ignored in implementations

Option 2 is a harder to implement, because you can no longer convert from Apath 
to Xpath without knowledge of the model. But as Apath expressions are not new, 
I’m thinking some other people will have an opinion on this :)

Regards,

Pieter Bos



_______________________________________________
openEHR-technical mailing list
[email protected]
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org

Re: openEHR draft Expression spec

Reply via email to