Hi Peter,

Thanks a lot for the answer.

I am still trying to wrap my head around this, and I understand the issues at 
play when dealing with a generic rule engine, since I am looking at an isolated 
case only. I was just thinking that in my particular case the covering 
annotation starts before matching 'Dog Cat’, so why would its ending right 
before Cat prevent the rule from firing? It doesn’t follow Dog, and a rule like 
“Dog Covering {->MARK(CHASE)}” wouldn’t therefore be matched either, but I 
understand now that it is enough that something else being present in this area 
between the two rule elements is enough for the match to fail. However, as you 
describe, the presence of SPACE annotations and a rule like Dog SPACE Cat { -> 
MARK(CHASE)} would succeed in matching despite the presence of the covering 
annotation.

Have you ever described the implementation of the matching in some paper or 
similar? I would be interested to have a look at it, but maybe it’s better just 
to have a go at the code? I would certainly prefer reading a high level 
abstract specification first though :)

Generally I cannot just trim the annotations in the real application, since 
some of these whitespaces are included in the marking for various reasons. I 
therefore played around with type filtering, since I was hoping that the type 
filter would allow me to match the rules while ignoring any presence of 
filtered types. I was again surprised to find out that filtering the Covering 
type while retaining Cat and Dog would in this case just prevent anything from 
being matched, because it seems to make all those text parts invisible where 
the filtered types appear, no matter if they cover any retained annotation 
types. So this didn’t seem to solve my problem either, although I could of 
course try to mark those areas I otherwise would consider trimming and include 
those in the rules like a space or filter on them, which I guess is what you 
suggested. It suddenly just becomes somewhat awkward though, and it may just be 
more clear to use RutaBasic with the rules instead.


Cheers,
Mario













> On 9 Oct 2019, at 09:35 , Peter Klügl <[email protected]> wrote:
> 
> Hi Mario,
> 
> 
> I need to take a closer look as this is not the usual scenario :-)
> 
> 
> However, without testing, I would assume that the second rule does not
> match because the space between dog and cat is not "empty".
> 
> 
> Normally, you have a complete partitioning provided by the seeding which
> causes the RutaBasic annotations. If there are only a few annotations,
> then there needs to be a decision if a text position is visible or not
> (as you have no SPACE, BREAK and MARKUP annotation). You would expect
> that the space between the annotations is ignored, but there is actually
> no reason why Ruta should do that, as there is no information at all
> that it should be ignored (... generic system, you might want to write
> rules for whitespaces...). In order to avoid this problem in such
> situations there is the option to define empty RutaBasics as invisible.
> That are text position where no annotation begins or ends (and not
> covered by annotations) AFAIR and sequential matching could not match at
> all anyway. Thus, the first space is ignored, but the not the second,
> because the Covering annotation ends there.
> 
> 
> Does that make sense?
> 
> 
> I think there are many option how your rules can become more robust, but
> that depends on your complete system/pipeline. Is it an option to trim
> annotations in order to avoid whitespaces at the beginning or ending? Is
> it easy to identify these positions? You could create an annotation
> there and filter it the type.
> 
> 
> 
> Best,
> 
> 
> Peter
> 
> 
> 
> Am 07.10.2019 um 10:21 schrieb Mario Juric:
>> Hi Peter,
>> 
>> I have a script that is executed without any seeders for performance 
>> reasons, and we don’t need the seeded annotations in that case. I have an 
>> issue involving annotation elements that partially cover the rule elements 
>> of interest, and I do not have a simple solution for it, so I have a 
>> question about the match semantics. Let me explain it using a simple example 
>> and the text ‘cat dog cat’.
>> 
>> Assume the following 4 annotation types and 2 rule statements:
>> 
>> DECLARE Covering;
>> DECLARE Cat;
>> DECLARE Dog;
>> DECLARE CHASE;
>> Cat Dog { -> MARK(CHASE)};
>> Dog Cat { -> MARK(CHASE)};
>> Assume prior to script execution the following annotations with beginnings 
>> and endings:
>> 
>> Cat[0,3[
>> Dog[4,7[
>> Cat[8,11[
>> Covering[0,8[
>> 
>> The Covering annotation is an example of the disturbing element that I 
>> observed, which has nothing or little to do with what I am trying to match. 
>> It just happens to be there for a reason unrelated to these rules, but it 
>> causes the second rule not to match when I expected it. Only the first rule 
>> fires, but the second will also fire when I change Covering bounds to [0,7[ 
>> though.
>> 
>> The order in which elements are matched seems very different from how they 
>> are usually selected from the CAS index, where you would get 'Covering Cat 
>> Dog Cat’, and with this order you would intuitvely expect both rules to 
>> match. This would probably be overly simplified though, since I would not be 
>> able to match adjacent covering annotations this way, so I believe matching 
>> is somehow based on edge detection. Sill, I have difficulties to understand 
>> why that extra covering space makes a difference.
>> 
>> I was hoping you could provide me with some details, and I also like to know 
>> what possible workaround options I have. I was considering playing around 
>> with type filtering, but it would require a bit of adding/removing types to 
>> be filtered during the script, so it didn’t seem as the simplest solution. 
>> Ensuring that covering always aligns with the end of a token is another 
>> possibility in this particular case, but I still need to add general 
>> robustness to the Ruta script against these scenarios. Any feedback is 
>> mostly appreciated, thanks :)
>> 
>> Cheers,
>> Mario
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> -- 
> Dr. Peter Klügl
> R&D Text Mining/Machine Learning
> 
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
> 
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: [email protected]
> Web: https://averbis.com
> 
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
> 

Reply via email to