Re: Question about covering annotations in Ruta match semantics

Mario Juric Wed, 23 Oct 2019 04:07:40 -0700

Here is the ticket:

https://issues.apache.org/jira/browse/UIMA-6137


Hope this suffices for a start.

Cheers,
Mario











> On 22 Oct 2019, at 09:01 , Peter Klügl <[email protected]> wrote:
> 
> Hi,
> 
> Am 21.10.2019 um 21:46 schrieb Mario Juric:
>> Thanks Peter,
>> 
>> No problem with the delay. I was on vacation myself, and sometimes it is 
>> just necessary to pull the plug :)
>> 
>> I am just happy that you take the time to answer my questions, and I think 
>> your answers help making sense to this. I now have some ideas that I can 
>> experiment with to see what works, but it’s possible to use RutaBasic when 
>> optional spaces are included in the rules, although it gets more awkward. I 
>> would still prefer to avoid this and having a type-based rule-logic feature 
>> would makes sense in our case. Shall I create a feature request for this?
> 
> 
> Yes, please create a ticket. Even specifiying what should be done helps,
> especially including more use cases than my own...
> 
> 
> Best,
> 
> 
> Peter
> 
> 
>> 
>> I wouldn’t expect you to do this any time soon, but let me know if there is 
>> something I could help out with when the time comes.
>> 
>> Cheers,
>> Mario
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On 18 Oct 2019, at 10:10 , Peter Klügl <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> 
>>> sorry for the delayed reply.
>>> 
>>> 
>>> comments below...
>>> 
>>> 
>>> Am 09.10.2019 um 22:19 schrieb Mario Juric:
>>>> Hi Peter,
>>>> 
>>>> Thanks a lot for the answer.
>>>> 
>>>> I am still trying to wrap my head around this, and I understand the issues 
>>>> at play when dealing with a generic rule engine, since I am looking at an 
>>>> isolated case only. I was just thinking that in my particular case the 
>>>> covering annotation starts before matching 'Dog Cat’, so why would its 
>>>> ending right before Cat prevent the rule from firing? It doesn’t follow 
>>>> Dog, and a rule like “Dog Covering {->MARK(CHASE)}” wouldn’t therefore be 
>>>> matched either, but I understand now that it is enough that something else 
>>>> being present in this area between the two rule elements is enough for the 
>>>> match to fail. However, as you describe, the presence of SPACE annotations 
>>>> and a rule like Dog SPACE Cat { -> MARK(CHASE)} would succeed in matching 
>>>> despite the presence of the covering annotation.
>>> 
>>> The main thing here is probably the requirement that the logic for
>>> applying the visibility concept should always be symmetric, meaning it
>>> should be the same regardless if the rule matches from left to right or
>>> from right to left (or inside out).
>>> 
>>> In your example, the rule matches from left to right (I assume), so that
>>> behavior that the last space is not skipped is not intuitive at all.
>>> However, if the rule would match for some reason from right to left,
>>> e.g., because of dynamic anchoring or a manual anchor, then the
>>> inference would detect a starting Covering annotation as the next
>>> possible position, which is not invisible (since there is nothing at all
>>> invisible). So there would actually be something that could be matched,
>>> but it is not the correct type (Dog).
>>> 
>>> I do not know if this explanation makes sense... it's easier with a
>>> whiteboard ;-)
>>> 
>>> 
>>> 
>>>> Have you ever described the implementation of the matching in some paper 
>>>> or similar? I would be interested to have a look at it, but maybe it’s 
>>>> better just to have a go at the code? I would certainly prefer reading a 
>>>> high level abstract specification first though :)
>>> 
>>> The last paper is the NLE journal article, which contains some high
>>> level description of the algorithm. However, this is some really
>>> specific functionality for a specific scenario. So, if I write a new
>>> paper, it will most likely not cover this.
>>> 
>>> 
>>>> Generally I cannot just trim the annotations in the real application, 
>>>> since some of these whitespaces are included in the marking for various 
>>>> reasons. I therefore played around with type filtering, since I was hoping 
>>>> that the type filter would allow me to match the rules while ignoring any 
>>>> presence of filtered types. I was again surprised to find out that 
>>>> filtering the Covering type while retaining Cat and Dog would in this case 
>>>> just prevent anything from being matched, because it seems to make all 
>>>> those text parts invisible where the filtered types appear, no matter if 
>>>> they cover any retained annotation types. So this didn’t seem to solve my 
>>>> problem either, although I could of course try to mark those areas I 
>>>> otherwise would consider trimming and include those in the rules like a 
>>>> space or filter on them, which I guess is what you suggested. It suddenly 
>>>> just becomes somewhat awkward though, and it may just be more clear to use 
>>>> RutaBasic with the rules instead.
>>> 
>>> Yes, the visibility concept in Ruta is not type-based but type
>>> coverage-based (and I think that's really cool)
>>> 
>>> It is possible to extend the functionality to additionally support
>>> type-based logic, but I do not know when this would be ready.
>>> 
>>> I would not recommend to use RutaBasic in the rules (I actually do not
>>> know right now, if it would work), but if you do, then you should
>>> probably deactivate the "empty is invisible" option.
>>> 
>>> 
>>> Best,
>>> 
>>> 
>>> Peter
>>> 
>>> 
>>>> Cheers,
>>>> Mario
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 9 Oct 2019, at 09:35 , Peter Klügl <[email protected]> wrote:
>>>>> 
>>>>> Hi Mario,
>>>>> 
>>>>> 
>>>>> I need to take a closer look as this is not the usual scenario :-)
>>>>> 
>>>>> 
>>>>> However, without testing, I would assume that the second rule does not
>>>>> match because the space between dog and cat is not "empty".
>>>>> 
>>>>> 
>>>>> Normally, you have a complete partitioning provided by the seeding which
>>>>> causes the RutaBasic annotations. If there are only a few annotations,
>>>>> then there needs to be a decision if a text position is visible or not
>>>>> (as you have no SPACE, BREAK and MARKUP annotation). You would expect
>>>>> that the space between the annotations is ignored, but there is actually
>>>>> no reason why Ruta should do that, as there is no information at all
>>>>> that it should be ignored (... generic system, you might want to write
>>>>> rules for whitespaces...). In order to avoid this problem in such
>>>>> situations there is the option to define empty RutaBasics as invisible.
>>>>> That are text position where no annotation begins or ends (and not
>>>>> covered by annotations) AFAIR and sequential matching could not match at
>>>>> all anyway. Thus, the first space is ignored, but the not the second,
>>>>> because the Covering annotation ends there.
>>>>> 
>>>>> 
>>>>> Does that make sense?
>>>>> 
>>>>> 
>>>>> I think there are many option how your rules can become more robust, but
>>>>> that depends on your complete system/pipeline. Is it an option to trim
>>>>> annotations in order to avoid whitespaces at the beginning or ending? Is
>>>>> it easy to identify these positions? You could create an annotation
>>>>> there and filter it the type.
>>>>> 
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> 
>>>>> Peter
>>>>> 
>>>>> 
>>>>> 
>>>>> Am 07.10.2019 um 10:21 schrieb Mario Juric:
>>>>>> Hi Peter,
>>>>>> 
>>>>>> I have a script that is executed without any seeders for performance 
>>>>>> reasons, and we don’t need the seeded annotations in that case. I have 
>>>>>> an issue involving annotation elements that partially cover the rule 
>>>>>> elements of interest, and I do not have a simple solution for it, so I 
>>>>>> have a question about the match semantics. Let me explain it using a 
>>>>>> simple example and the text ‘cat dog cat’.
>>>>>> 
>>>>>> Assume the following 4 annotation types and 2 rule statements:
>>>>>> 
>>>>>> DECLARE Covering;
>>>>>> DECLARE Cat;
>>>>>> DECLARE Dog;
>>>>>> DECLARE CHASE;
>>>>>> Cat Dog { -> MARK(CHASE)};
>>>>>> Dog Cat { -> MARK(CHASE)};
>>>>>> Assume prior to script execution the following annotations with 
>>>>>> beginnings and endings:
>>>>>> 
>>>>>> Cat[0,3[
>>>>>> Dog[4,7[
>>>>>> Cat[8,11[
>>>>>> Covering[0,8[
>>>>>> 
>>>>>> The Covering annotation is an example of the disturbing element that I 
>>>>>> observed, which has nothing or little to do with what I am trying to 
>>>>>> match. It just happens to be there for a reason unrelated to these 
>>>>>> rules, but it causes the second rule not to match when I expected it. 
>>>>>> Only the first rule fires, but the second will also fire when I change 
>>>>>> Covering bounds to [0,7[ though.
>>>>>> 
>>>>>> The order in which elements are matched seems very different from how 
>>>>>> they are usually selected from the CAS index, where you would get 
>>>>>> 'Covering Cat Dog Cat’, and with this order you would intuitvely expect 
>>>>>> both rules to match. This would probably be overly simplified though, 
>>>>>> since I would not be able to match adjacent covering annotations this 
>>>>>> way, so I believe matching is somehow based on edge detection. Sill, I 
>>>>>> have difficulties to understand why that extra covering space makes a 
>>>>>> difference.
>>>>>> 
>>>>>> I was hoping you could provide me with some details, and I also like to 
>>>>>> know what possible workaround options I have. I was considering playing 
>>>>>> around with type filtering, but it would require a bit of 
>>>>>> adding/removing types to be filtered during the script, so it didn’t 
>>>>>> seem as the simplest solution. Ensuring that covering always aligns with 
>>>>>> the end of a token is another possibility in this particular case, but I 
>>>>>> still need to add general robustness to the Ruta script against these 
>>>>>> scenarios. Any feedback is mostly appreciated, thanks :)
>>>>>> 
>>>>>> Cheers,
>>>>>> Mario
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> -- 
>>>>> Dr. Peter Klügl
>>>>> R&D Text Mining/Machine Learning
>>>>> 
>>>>> Averbis GmbH
>>>>> Salzstr. 15
>>>>> 79098 Freiburg
>>>>> Germany
>>>>> 
>>>>> Fon: +49 761 708 394 0
>>>>> Fax: +49 761 708 394 10
>>>>> Email: [email protected]
>>>>> Web: https://averbis.com
>>>>> 
>>>>> Headquarters: Freiburg im Breisgau
>>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>> 
>>> -- 
>>> Dr. Peter Klügl
>>> R&D Text Mining/Machine Learning
>>> 
>>> Averbis GmbH
>>> Salzstr. 15
>>> 79098 Freiburg
>>> Germany
>>> 
>>> Fon: +49 761 708 394 0
>>> Fax: +49 761 708 394 10
>>> Email: [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>> Web: https://averbis.com <https://averbis.com/> <https://averbis.com/ 
>>> <https://averbis.com/>>
>>> 
>>> Headquarters: Freiburg im Breisgau
>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: Question about covering annotations in Ruta match semantics

Reply via email to