Hi Peter,
I have a script that is executed without any seeders for performance reasons,
and we don’t need the seeded annotations in that case. I have an issue
involving annotation elements that partially cover the rule elements of
interest, and I do not have a simple solution for it, so I have a question
about the match semantics. Let me explain it using a simple example and the
text ‘cat dog cat’.
Assume the following 4 annotation types and 2 rule statements:
DECLARE Covering;
DECLARE Cat;
DECLARE Dog;
DECLARE CHASE;
Cat Dog { -> MARK(CHASE)};
Dog Cat { -> MARK(CHASE)};
Assume prior to script execution the following annotations with beginnings and
endings:
Cat[0,3[
Dog[4,7[
Cat[8,11[
Covering[0,8[
The Covering annotation is an example of the disturbing element that I
observed, which has nothing or little to do with what I am trying to match. It
just happens to be there for a reason unrelated to these rules, but it causes
the second rule not to match when I expected it. Only the first rule fires, but
the second will also fire when I change Covering bounds to [0,7[ though.
The order in which elements are matched seems very different from how they are
usually selected from the CAS index, where you would get 'Covering Cat Dog
Cat’, and with this order you would intuitvely expect both rules to match. This
would probably be overly simplified though, since I would not be able to match
adjacent covering annotations this way, so I believe matching is somehow based
on edge detection. Sill, I have difficulties to understand why that extra
covering space makes a difference.
I was hoping you could provide me with some details, and I also like to know
what possible workaround options I have. I was considering playing around with
type filtering, but it would require a bit of adding/removing types to be
filtered during the script, so it didn’t seem as the simplest solution.
Ensuring that covering always aligns with the end of a token is another
possibility in this particular case, but I still need to add general robustness
to the Ruta script against these scenarios. Any feedback is mostly appreciated,
thanks :)
Cheers,
Mario