Question about covering annotations in Ruta match semantics

Mario Juric Mon, 07 Oct 2019 01:22:06 -0700

Hi Peter,

I have a script that is executed without any seeders for performance reasons, 
and we don’t need the seeded annotations in that case. I have an issue 
involving annotation elements that partially cover the rule elements of 
interest, and I do not have a simple solution for it, so I have a question 
about the match semantics. Let me explain it using a simple example and the 
text ‘cat dog cat’.


Assume the following 4 annotation types and 2 rule statements:

DECLARE Covering;
DECLARE Cat;
DECLARE Dog;
DECLARE CHASE;
Cat Dog { -> MARK(CHASE)};
Dog Cat { -> MARK(CHASE)};
Assume prior to script execution the following annotations with beginnings and 
endings:

Cat[0,3[
Dog[4,7[
Cat[8,11[
Covering[0,8[

The Covering annotation is an example of the disturbing element that I 
observed, which has nothing or little to do with what I am trying to match. It 
just happens to be there for a reason unrelated to these rules, but it causes 
the second rule not to match when I expected it. Only the first rule fires, but 
the second will also fire when I change Covering bounds to [0,7[ though.

The order in which elements are matched seems very different from how they are 
usually selected from the CAS index, where you would get 'Covering Cat Dog 
Cat’, and with this order you would intuitvely expect both rules to match. This 
would probably be overly simplified though, since I would not be able to match 
adjacent covering annotations this way, so I believe matching is somehow based 
on edge detection. Sill, I have difficulties to understand why that extra 
covering space makes a difference.

I was hoping you could provide me with some details, and I also like to know 
what possible workaround options I have. I was considering playing around with 
type filtering, but it would require a bit of adding/removing types to be 
filtered during the script, so it didn’t seem as the simplest solution. 
Ensuring that covering always aligns with the end of a token is another 
possibility in this particular case, but I still need to add general robustness 
to the Ruta script against these scenarios. Any feedback is mostly appreciated, 
thanks :)

Cheers,
Mario

Question about covering annotations in Ruta match semantics

Reply via email to