Re: Rules in archetypes - what are the requirements?

2019-02-02 Thread Thomas Beale


On 02/02/2019 16:21, Pieter Bos wrote:


*From: *openEHR-technical 
 on behalf of Thomas 
Beale 
*Reply-To: *For openEHR technical discussions 


*Date: *Saturday, 2 February 2019 at 15:01
*To: *"openehr-technical@lists.openehr.org" 


*Subject: *Re: Rules in archetypes - what are the requirements?

Assuming you meant to put 'id7' as the first one, I don't understand 
what this achieves beyond just:


/events[id2]/data/items/element[id7]/value/magnitude >
    /events[id2]/data/items/element[id4]/value/magnitude +
    /events[id2]/data/items/element[id5]/value/magnitude +
    /events[id2]/data/items/element[id6]/value/magnitude

which says the same thing, for all events in the runtime data that 
conform to the /events[id2] branch of the structure.


Since the occurrences of events[id2] can be more than one, 
/events[id2]/data/items/element[id7]/value/magnitude in the execution 
environment (actual data) maps to  a List.


well it could be understood that way - that would be to literally 
interpret it as a runtime path. The way I think of it is to mean 'this 
condition xyz must hold true' for each instance to which it applies. 
This greatly simplifies things - otherwise you end up in a complicated 
place like you have described below.


That means you need to define operators such as +, > and = on a list 
of reals. Or to define somehow that a statement containing only path 
bindings within a single multiply-occurring structure will mean that 
it gets evaluated for each occurrence of such a structure. The second 
case is complicated if you need to include data outside /events[id2] 
in your expression. A real world use case would be data in /protocol, 
which can influence the interpretation of event data, but is outside 
of the event.
So we did the first in Archie, with a bit of tricks to make this work 
for assignments. For example, how the plus operator is interpreted in 
Archie:


+(Real r1, Real r2)
  return the sum both numbers
...
Much of this complexity can be avoided by  not defining the operator 
on lists/sets, and requiring the for_all loop on lists or sets of data 
in the specification. This comes at a price, because the author of the 
expressions needs to understand more of the language and data 
structures. So we chose the second, since the previous draft 
specification did not specify at all how to handle these cases.


Undefined value handling is another subject, I have not checked yet if 
the new proposal solves it. We defined some functions to handle this 
explicitly if the automatic handling doesn’t do it ((input , 
alternative) -> return input  unless input is undefined, then 
alternative), as well as some rounding functions.


the Expression Language draft 
 
has the defined() predicate which I think should do the job.



If we were to allow the expression for_all $event in /data/events[id3] 
then we need to be clear on what it means: it actually refers to an 
object of type List, but do the members consist of EVENT 
objects found in data at the moment the expression is evaluated? Or 
just the statically defined members in the archetype - which can also 
be multiple, e.g. see the Apgar archetype, it has 1 min, 2 min, 5 min 
events?


You would need to evaluate on actual data. If you define it on the 
archetype data, you would need some kind of rules to convert it to an 
evaluation on actual data with different multiplicities than the rules 
specify, for example if events[id2] has occurrences > 1. Might be 
possible, I have not tried to define that. Would probably include some 
extra for_all loops plus some kind of validation errors for cases that 
cannot be converted.
So I would say always the data found at the moment which the 
expression is evaluated. You can still refer to separate statically 
defined members using their distinct node ids, and even those could 
have occurrences > 1 (not in the apgar example since those have 
occurrences {0..1} in the archetype).


Normally we want the processing of 'rules' expressions in archetypes 
to apply to the data when the archetype is being used in its normal 
role of creation / semantic validation at data commit time.


Agreed.

So it seems to me that if we want to support expressions like the 
above, we need to be able to do something like (don't worry about the 
concrete syntax too much, could easily be TS or java-flavoured):


*use_model*
    org.openehr.rm

*data_context*

$xxx_events: List
    $item_aaa, $item_bbb, $item_ccc, $item_ddd: Real

*definition*

check for_all event in $xxx_events:
    event/$item_aaa > event/$item_bbb + event/$item_ccc + 
event/$item_ddd


*data_bindings*-- pseudo-code for now

$xxx_events -> /events[id2]
    $item_aaa -> /data/items/element[id7]/value/magnitude
    $item_bbb -> /data/items/element[id4]/value/magnitude
    $item_ccc -> /data/items/element[id5]/value/magnitude
    $item_ddd -> 

Re: Rules in archetypes - what are the requirements?

2019-02-02 Thread Pieter Bos

From: openEHR-technical  on behalf 
of Thomas Beale 
Reply-To: For openEHR technical discussions 

Date: Saturday, 2 February 2019 at 15:01
To: "openehr-technical@lists.openehr.org" 
Subject: Re: Rules in archetypes - what are the requirements?


Assuming you meant to put 'id7' as the first one, I don't understand what this 
achieves beyond just:

/events[id2]/data/items/element[id7]/value/magnitude >
/events[id2]/data/items/element[id4]/value/magnitude +
/events[id2]/data/items/element[id5]/value/magnitude +
/events[id2]/data/items/element[id6]/value/magnitude

which says the same thing, for all events in the runtime data that conform to 
the /events[id2] branch of the structure.

Since the occurrences of events[id2] can be more than one, 
/events[id2]/data/items/element[id7]/value/magnitude in the execution 
environment (actual data) maps to  a List. That means you need to define 
operators such as +, > and = on a list of reals. Or to define somehow that a 
statement containing only path bindings within a single multiply-occurring 
structure will mean that it gets evaluated for each occurrence of such a 
structure. The second case is complicated if you need to include data outside 
/events[id2] in your expression. A real world use case would be data in 
/protocol, which can influence the interpretation of event data, but is outside 
of the event.
So we did the first in Archie, with a bit of tricks to make this work for 
assignments. For example, how the plus operator is interpreted in Archie:

+(Real r1, Real r2)
  return the sum both numbers
+(Real r1, List r2)
   return a list the same size as r2, with every element result[i] = r1 + r2[i]
+(List r1, r2)
   return a list the same size as r1, with every element result[i] = r1[i] + r2
+(Listl r1, List r2)
  if r1 and r2 have different length, throw an exception. Otherwise, return a 
list the same size as r1, with every element result[i] being r1[i] + r2[i]

And the > operator:

>(Real r1, Real r2)
  return true if r1 > r2, false otherwise
>(Real r1, List r2)
  return a list of Booleans, one for every element in r2, true if r1 > that 
element
>(List r1, r2)
  return a list of Booleans, one for every element in r1, true if that element 
> r2
>(Listl r1, List r2)
  if r1 and r2 have different length, throw an exception. Otherwise, return a 
list of Booleans, with every element result[i] being r1[i] > r2[i]

This is a simplification, in reality there is null-handling and integer-> real 
conversion involved, which is also missing in the specification I think.
We defined assertions/checks to only succeed if every Boolean in such a list is 
true or null/undefined (to not fail on data which is optional). Additionally in 
Archie every value returned contains every unique path in the data that was 
used to evaluate it, so you can see exactly for which data an assertion/check 
failed in the result, to notify the user where the problem occurs. Or to apply 
an assignment to the correct output if the output path does not match a single 
field. This last bit is implementation specific of course.

Much of this complexity can be avoided by  not defining the operator on 
lists/sets, and requiring the for_all loop on lists or sets of data in the 
specification. This comes at a price, because the author of the expressions 
needs to understand more of the language and data structures. So we chose the 
second, since the previous draft specification did not specify at all how to 
handle these cases.

Undefined value handling is another subject, I have not checked yet if the new 
proposal solves it. We defined some functions to handle this explicitly if the 
automatic handling doesn’t do it ((input , alternative) -> return input  unless 
input is undefined, then alternative), as well as some rounding functions.

If we were to allow the expression for_all $event in /data/events[id3] then we 
need to be clear on what it means: it actually refers to an object of type 
List, but do the members consist of EVENT objects found in data at the 
moment the expression is evaluated? Or just the statically defined members in 
the archetype - which can also be multiple, e.g. see the Apgar archetype, it 
has 1 min, 2 min, 5 min events?

You would need to evaluate on actual data. If you define it on the archetype 
data, you would need some kind of rules to convert it to an evaluation on 
actual data with different multiplicities than the rules specify, for example 
if events[id2] has occurrences > 1. Might be possible, I have not tried to 
define that. Would probably include some extra for_all loops plus some kind of 
validation errors for cases that cannot be converted.
So I would say always the data found at the moment which the expression is 
evaluated. You can still refer to separate statically defined members using 
their distinct node ids, and even those could have occurrences > 1 (not in the 
apgar example since those have occurrences {0..1} in the archetype).


Re: Rules in archetypes - what are the requirements?

2019-02-02 Thread Pieter Bos
But then you need a second language to define a function to calculate a simple 
score. Without it's possible to create tooling to help modelers write at least 
simple expressions. basic math and some simple 'if x is between 3 and 10, fill 
this field with this value' is entirely doable without manually writing 
expressions, even in the proposed syntax. And the more technically inclined 
modellers can do a lot more if the tooling helps them with choosing the right 
paths. Also functions will need to be reimplemented for every environment - 
doesn't sound like interoperability to me.

On the subject of the lists of values problem: Instead of binding outside the 
expression language, you could do something like assigning the value of a path 
to variables with a readable name as statements in the actual expression 
language. in combination with a for loop that can contain multiple statements, 
or at least variable assignments and one statement, it would solve these issues 
completely and still be readable to humans. and you can still write tooling to 
visualize the rules to non-technical people like we can now.

Also I consider the requirement to explicitly state the type of variables 
beforehand to be tricky. Tooling could solve it, but it's not something I can 
easily explain to less technical people.

An alternative could be to switch to something Turing complete, like JavaScript 
or typescript, to define a few included functions to lookup data and to 
interact with the data, and a defined data model to ensure interoperability. 
Much easier to specify and implement, although it will be harder/no longer 
possible to write tooling to visualize the code and harder to write tools that 
allow less-technical people to write code.

Regards,

Pieter Bos

Op 2 feb. 2019 14:16 schreef Ian McNicoll :
Hi Pieter,

"But why would I need a function to calculate a score that is just a sum of a 
number of values, instead of a few +-operators?"

It is an open question but one advantage of using the function approach, with 
simple values is that it can encapsulate the algorithm without too much 
dependency on understanding openEHR paths or path -bindings. That should allow 
broader engagement with non-openEHR specialists.

My preference is to support use of openEHR datatypes within the expression 
(albeit perhaps in reduced format), as otherwise passing units etc as scalars 
starts to get cumbersome.

e.g


$apgar_heartrate, $apgar_breathing, $apgar_reflex, $apgar_muscle, $apgar_colour)

where each of these is actually an ordinal, rather than a scalar value.

Not such a good example but think of a BMI calc, where the units used for 
height and weight are critical

We can learn a lot from GDL experience in this regard.

Ian

Dr Ian McNicoll
mobile +44 (0)775 209 7859
office +44 (0)1536 414994
skype: ianmcnicoll
email: i...@freshehr.com
twitter: @ianmcnicoll

[https://docs.google.com/uc?export=download=0BzLo3mNUvbAjUmNWaFZYZlZ5djg=0BzLo3mNUvbAjRzZKc0JpUXl2SkRtMDJ0bkdUcUQxM2dqSVdrPQ]
Co-Chair, openEHR Foundation 
ian.mcnic...@openehr.org
Director, freshEHR Clinical Informatics Ltd.
Director, HANDIHealth CIC
Hon. Senior Research Associate, CHIME, UCL


On Fri, 1 Feb 2019 at 14:53, Pieter Bos 
mailto:pieter@nedap.com>> wrote:
About the calculation:

Ah, I see, the assignment seems like a good solution. But why would I need a 
function to calculate a score that is just a sum of a number of values, instead 
of a few +-operators?


Multiplicities/data binding:

The there exists case is clear. However, what if I have four events, all having 
four elements, each with dv_quantity as value. Say I want the magnitude of the 
last of these quantities to be larger than the sum of the first three. Before I 
could write something like:

for_all $event in /data/events[id3]
 $event/data/items/element[id6]/value/magnitude >
$event/data/items/element[id4]/value/magnitude +
$event/data/items/element[id5]/value/magnitude +
$event/data/items/element[id6]/value/magnitude

(I omitted a few node ids here that are not important for the example)
Not the most readable -  but it does the job. With data binding, how do I 
express this? There no longer seems to be a path lookup outside of data 
binding, so I can’t write:

for_all $event in $events
 $event/data/items/element[id6]/value/magnitude >
$event/data/items/element[id4]/value/magnitude +
$event/data/items/element[id5]/value/magnitude +
$event/data/items/element[id6]/value/magnitude

And binding all the separate paths to variables doesn’t work either – they will 
be bound as lists, and there is no way to iterate over four lists of values at 
once.

Note that a path that points to a single typed dvquantity in an archetype can 
still point to many items in the RM if somewhere up the tree there is a list or 
a set, for example more than one observation. So if you really want them to be 

Re: Rules in archetypes - what are the requirements?

2019-02-02 Thread Thomas Beale


On 02/02/2019 13:15, Ian McNicoll wrote:

Hi Pieter,

"But why would I need a function to calculate a score that is just a 
sum of a number of values, instead of a few +-operators?"


It is an open question but one advantage of using the function 
approach, with simple values is that it can encapsulate the algorithm 
without too much dependency on understanding openEHR paths or path 
-bindings. That should allow broader engagement with non-openEHR 
specialists.


My preference is to support use of openEHR datatypes within the 
expression (albeit perhaps in reduced format), as otherwise passing 
units etc as scalars starts to get cumbersome.


agree - that is the idea of this construct in the EL

*use_model*
    org.openehr.rm

then you can declare vars to be of RM types like DvQuantity, DvOrdinal etc.

- t


___
openEHR-technical mailing list
openEHR-technical@lists.openehr.org
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org


Re: Rules in archetypes - what are the requirements?

2019-02-02 Thread Thomas Beale


On 01/02/2019 14:53, Pieter Bos wrote:


About the calculation:

Ah, I see, the assignment seems like a good solution. But why would I 
need a function to calculate a score that is just a sum of a number of 
values, instead of a few +-operators?


well you might want to re-use that function. More generally, some 
functions are more interesting, e.g. MAP calculation, and it is 
potentially better to a) name them, b) declare them in a more obvious 
place and c) make them re-usable.





Multiplicities/data binding:

The there exists case is clear. However, what if I have four events, 
all having four elements, each with dv_quantity as value. Say I want 
the magnitude of the last of these quantities to be larger than the 
sum of the first three. Before I could write something like:


for_all $event in /data/events[id3]
 $event/data/items/element[id6]/value/magnitude >
$event/data/items/element[id4]/value/magnitude +
$event/data/items/element[id5]/value/magnitude +
$event/data/items/element[id6]/value/magnitude

Assuming you meant to put 'id7' as the first one, I don't understand 
what this achieves beyond just:


/events[id2]/data/items/element[id7]/value/magnitude >
    /events[id2]/data/items/element[id4]/value/magnitude +
    /events[id2]/data/items/element[id5]/value/magnitude +
    /events[id2]/data/items/element[id6]/value/magnitude

which says the same thing, for all events in the runtime data that 
conform to the /events[id2] branch of the structure.


If we were to allow the expression for_all $event in /data/events[id3] 
then we need to be clear on what it means: it actually refers to an 
object of type List, but do the members consist of EVENT objects 
found in data at the moment the expression is evaluated? Or just the 
statically defined members in the archetype - which can also be 
multiple, e.g. see the Apgar archetype, it has 1 min, 2 min, 5 min events?


Normally we want the processing of 'rules' expressions in archetypes to 
apply to the data when the archetype is being used in its normal role of 
creation / semantic validation at data commit time. So it seems to me 
that if we want to support expressions like the above, we need to be 
able to do something like (don't worry about the concrete syntax too 
much, could easily be TS or java-flavoured):


*use_model*
    org.openehr.rm

*data_context*

    $xxx_events: List
    $item_aaa, $item_bbb, $item_ccc, $item_ddd: Real

*definition*

    check for_all event in $xxx_events:
    event/$item_aaa > event/$item_bbb + event/$item_ccc + 
event/$item_ddd


*data_bindings* -- pseudo-code for now

$xxx_events -> /events[id2]
$item_aaa -> /data/items/element[id7]/value/magnitude
$item_bbb -> /data/items/element[id4]/value/magnitude
$item_ccc -> /data/items/element[id5]/value/magnitude
$item_ddd -> /data/items/element[id6]/value/magnitude

I don't know what this archetype is, so assume that $xxx_events, 
$item_aaa etc are more meaningful names.


The next problem you mentioned is:

PB: Note that a path that points to a single typed dvquantity in an 
archetype can still point to many items in the RM if somewhere up the 
tree there is a list or a set, for example more than one observation


So I think this implies an incorrect interpretation of this kind of code 
within an archetype. It can't be understood as simultaneously applying 
to multiple Observations if it is within an Observation archetype, only 
to one OBSERVATION instance at a time - usually one about to be committed.


You can still have Lists of things internal to the archetype, as shown 
above with the Events list, but to process the multiplicity, you would 
need to do as we have done and use for_all, or some other 
container-aware operator or function.


Anyway, does this get closer to the sense of what you would like to do? 
It's more than I had conceived of, so this is a useful challenge...


- thomas


___
openEHR-technical mailing list
openEHR-technical@lists.openehr.org
http://lists.openehr.org/mailman/listinfo/openehr-technical_lists.openehr.org


Re: Rules in archetypes - what are the requirements?

2019-02-02 Thread Ian McNicoll
Hi Pieter,

"But why would I need a function to calculate a score that is just a sum of
a number of values, instead of a few +-operators?"

It is an open question but one advantage of using the function approach,
with simple values is that it can encapsulate the algorithm without too
much dependency on understanding openEHR paths or path -bindings. That
should allow broader engagement with non-openEHR specialists.

My preference is to support use of openEHR datatypes within the expression
(albeit perhaps in reduced format), as otherwise passing units etc as
scalars starts to get cumbersome.

e.g

$apgar_heartrate, $apgar_breathing, $apgar_reflex, $apgar_muscle, $
apgar_colour)

where each of these is actually an ordinal, rather than a scalar value.

Not such a good example but think of a BMI calc, where the units used for
height and weight are critical
We can learn a lot from GDL experience in this regard.

Ian

Dr Ian McNicoll
mobile +44 (0)775 209 7859
office +44 (0)1536 414994
skype: ianmcnicoll
email: i...@freshehr.com
twitter: @ianmcnicoll


Co-Chair, openEHR Foundation ian.mcnic...@openehr.org
Director, freshEHR Clinical Informatics Ltd.
Director, HANDIHealth CIC
Hon. Senior Research Associate, CHIME, UCL


On Fri, 1 Feb 2019 at 14:53, Pieter Bos  wrote:

> About the calculation:
>
>
>
> Ah, I see, the assignment seems like a good solution. But why would I need
> a function to calculate a score that is just a sum of a number of values,
> instead of a few +-operators?
>
>
> Multiplicities/data binding:
>
> The there exists case is clear. However, what if I have four events, all
> having four elements, each with dv_quantity as value. Say I want the
> magnitude of the last of these quantities to be larger than the sum of the
> first three. Before I could write something like:
>
> for_all $event in /data/events[id3]
>  $event/data/items/element[id6]/value/magnitude >
> $event/data/items/element[id4]/value/magnitude +
> $event/data/items/element[id5]/value/magnitude +
> $event/data/items/element[id6]/value/magnitude
>
>
>
> (I omitted a few node ids here that are not important for the example)
>
> Not the most readable -  but it does the job. With data binding, how do I
> express this? There no longer seems to be a path lookup outside of data
> binding, so I can’t write:
>
> for_all $event in $events
>  $event/data/items/element[id6]/value/magnitude >
> $event/data/items/element[id4]/value/magnitude +
> $event/data/items/element[id5]/value/magnitude +
> $event/data/items/element[id6]/value/magnitude
>
>
>
> And binding all the separate paths to variables doesn’t work either – they
> will be bound as lists, and there is no way to iterate over four lists of
> values at once.
>
>
>
> Note that a path that points to a single typed dvquantity in an archetype
> can still point to many items in the RM if somewhere up the tree there is a
> list or a set, for example more than one observation. So if you really want
> them to be typed on validation time, you need to check every attribute in
> the path to see if it can point to more than one value, then either make it
> a List> or define in which order to add it as a single list.
>
> I implemented it by determining type at runtime, but it’s possible
> otherwise. This means that very often you need a for all statement, in
> which case data binding doesn’t really help. I defined some tricks with the
> basic operators also working on equally sized lists to make things a bit
> easier to understand for modelers. That’s why I asked about the execution
> rules. The tricks I did can be easily rewritten into for_all statements if
> we need to have them removed.
>
>
>
> This leads to more interesting cases when you flatten rules to an OPT 2
> template, to obtain a single artifact that can be used for many things,
> including rule evaluation. That is very doable right now by prepending some
> paths and adding some for_all statements. I’m not sure how to do that with
> data binding.
>
>
>
> Regards,
>
>
> Pieter Bos
>
>
>
> *From: *openEHR-technical 
> on behalf of Thomas Beale 
> *Reply-To: *For openEHR technical discussions <
> openehr-technical@lists.openehr.org>
> *Date: *Friday, 1 February 2019 at 14:16
> *To: *"openehr-technical@lists.openehr.org" <
> openehr-technical@lists.openehr.org>
> *Subject: *Re: Rules in archetypes - what are the requirements?
>
>
>
> Thanks Pieter,
>
> this is very useful.
>
> On 01/02/2019 12:54, Pieter Bos wrote:
>
> For us the main requirement of the rules is to calculate the value of
> other fields based on other fields. Only the checking of assertions has
> relatively little added value for the use cases our customers encounter. I
> would find it very hard to explain to any users or modelers that they can
> write checks that do the actual score calculation, but that they cannot
> actually use the calculated value anywhere. So we ignore this limitation
> altogether.
>
> the obvious