Here is the ticket: https://issues.apache.org/jira/browse/UIMA-6137
Hope this suffices for a start. Cheers, Mario > On 22 Oct 2019, at 09:01 , Peter Klügl <[email protected]> wrote: > > Hi, > > Am 21.10.2019 um 21:46 schrieb Mario Juric: >> Thanks Peter, >> >> No problem with the delay. I was on vacation myself, and sometimes it is >> just necessary to pull the plug :) >> >> I am just happy that you take the time to answer my questions, and I think >> your answers help making sense to this. I now have some ideas that I can >> experiment with to see what works, but it’s possible to use RutaBasic when >> optional spaces are included in the rules, although it gets more awkward. I >> would still prefer to avoid this and having a type-based rule-logic feature >> would makes sense in our case. Shall I create a feature request for this? > > > Yes, please create a ticket. Even specifiying what should be done helps, > especially including more use cases than my own... > > > Best, > > > Peter > > >> >> I wouldn’t expect you to do this any time soon, but let me know if there is >> something I could help out with when the time comes. >> >> Cheers, >> Mario >> >> >> >> >> >> >> >> >> >> >> >> >> >>> On 18 Oct 2019, at 10:10 , Peter Klügl <[email protected]> wrote: >>> >>> Hi, >>> >>> >>> sorry for the delayed reply. >>> >>> >>> comments below... >>> >>> >>> Am 09.10.2019 um 22:19 schrieb Mario Juric: >>>> Hi Peter, >>>> >>>> Thanks a lot for the answer. >>>> >>>> I am still trying to wrap my head around this, and I understand the issues >>>> at play when dealing with a generic rule engine, since I am looking at an >>>> isolated case only. I was just thinking that in my particular case the >>>> covering annotation starts before matching 'Dog Cat’, so why would its >>>> ending right before Cat prevent the rule from firing? It doesn’t follow >>>> Dog, and a rule like “Dog Covering {->MARK(CHASE)}” wouldn’t therefore be >>>> matched either, but I understand now that it is enough that something else >>>> being present in this area between the two rule elements is enough for the >>>> match to fail. However, as you describe, the presence of SPACE annotations >>>> and a rule like Dog SPACE Cat { -> MARK(CHASE)} would succeed in matching >>>> despite the presence of the covering annotation. >>> >>> The main thing here is probably the requirement that the logic for >>> applying the visibility concept should always be symmetric, meaning it >>> should be the same regardless if the rule matches from left to right or >>> from right to left (or inside out). >>> >>> In your example, the rule matches from left to right (I assume), so that >>> behavior that the last space is not skipped is not intuitive at all. >>> However, if the rule would match for some reason from right to left, >>> e.g., because of dynamic anchoring or a manual anchor, then the >>> inference would detect a starting Covering annotation as the next >>> possible position, which is not invisible (since there is nothing at all >>> invisible). So there would actually be something that could be matched, >>> but it is not the correct type (Dog). >>> >>> I do not know if this explanation makes sense... it's easier with a >>> whiteboard ;-) >>> >>> >>> >>>> Have you ever described the implementation of the matching in some paper >>>> or similar? I would be interested to have a look at it, but maybe it’s >>>> better just to have a go at the code? I would certainly prefer reading a >>>> high level abstract specification first though :) >>> >>> The last paper is the NLE journal article, which contains some high >>> level description of the algorithm. However, this is some really >>> specific functionality for a specific scenario. So, if I write a new >>> paper, it will most likely not cover this. >>> >>> >>>> Generally I cannot just trim the annotations in the real application, >>>> since some of these whitespaces are included in the marking for various >>>> reasons. I therefore played around with type filtering, since I was hoping >>>> that the type filter would allow me to match the rules while ignoring any >>>> presence of filtered types. I was again surprised to find out that >>>> filtering the Covering type while retaining Cat and Dog would in this case >>>> just prevent anything from being matched, because it seems to make all >>>> those text parts invisible where the filtered types appear, no matter if >>>> they cover any retained annotation types. So this didn’t seem to solve my >>>> problem either, although I could of course try to mark those areas I >>>> otherwise would consider trimming and include those in the rules like a >>>> space or filter on them, which I guess is what you suggested. It suddenly >>>> just becomes somewhat awkward though, and it may just be more clear to use >>>> RutaBasic with the rules instead. >>> >>> Yes, the visibility concept in Ruta is not type-based but type >>> coverage-based (and I think that's really cool) >>> >>> It is possible to extend the functionality to additionally support >>> type-based logic, but I do not know when this would be ready. >>> >>> I would not recommend to use RutaBasic in the rules (I actually do not >>> know right now, if it would work), but if you do, then you should >>> probably deactivate the "empty is invisible" option. >>> >>> >>> Best, >>> >>> >>> Peter >>> >>> >>>> Cheers, >>>> Mario >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>> On 9 Oct 2019, at 09:35 , Peter Klügl <[email protected]> wrote: >>>>> >>>>> Hi Mario, >>>>> >>>>> >>>>> I need to take a closer look as this is not the usual scenario :-) >>>>> >>>>> >>>>> However, without testing, I would assume that the second rule does not >>>>> match because the space between dog and cat is not "empty". >>>>> >>>>> >>>>> Normally, you have a complete partitioning provided by the seeding which >>>>> causes the RutaBasic annotations. If there are only a few annotations, >>>>> then there needs to be a decision if a text position is visible or not >>>>> (as you have no SPACE, BREAK and MARKUP annotation). You would expect >>>>> that the space between the annotations is ignored, but there is actually >>>>> no reason why Ruta should do that, as there is no information at all >>>>> that it should be ignored (... generic system, you might want to write >>>>> rules for whitespaces...). In order to avoid this problem in such >>>>> situations there is the option to define empty RutaBasics as invisible. >>>>> That are text position where no annotation begins or ends (and not >>>>> covered by annotations) AFAIR and sequential matching could not match at >>>>> all anyway. Thus, the first space is ignored, but the not the second, >>>>> because the Covering annotation ends there. >>>>> >>>>> >>>>> Does that make sense? >>>>> >>>>> >>>>> I think there are many option how your rules can become more robust, but >>>>> that depends on your complete system/pipeline. Is it an option to trim >>>>> annotations in order to avoid whitespaces at the beginning or ending? Is >>>>> it easy to identify these positions? You could create an annotation >>>>> there and filter it the type. >>>>> >>>>> >>>>> >>>>> Best, >>>>> >>>>> >>>>> Peter >>>>> >>>>> >>>>> >>>>> Am 07.10.2019 um 10:21 schrieb Mario Juric: >>>>>> Hi Peter, >>>>>> >>>>>> I have a script that is executed without any seeders for performance >>>>>> reasons, and we don’t need the seeded annotations in that case. I have >>>>>> an issue involving annotation elements that partially cover the rule >>>>>> elements of interest, and I do not have a simple solution for it, so I >>>>>> have a question about the match semantics. Let me explain it using a >>>>>> simple example and the text ‘cat dog cat’. >>>>>> >>>>>> Assume the following 4 annotation types and 2 rule statements: >>>>>> >>>>>> DECLARE Covering; >>>>>> DECLARE Cat; >>>>>> DECLARE Dog; >>>>>> DECLARE CHASE; >>>>>> Cat Dog { -> MARK(CHASE)}; >>>>>> Dog Cat { -> MARK(CHASE)}; >>>>>> Assume prior to script execution the following annotations with >>>>>> beginnings and endings: >>>>>> >>>>>> Cat[0,3[ >>>>>> Dog[4,7[ >>>>>> Cat[8,11[ >>>>>> Covering[0,8[ >>>>>> >>>>>> The Covering annotation is an example of the disturbing element that I >>>>>> observed, which has nothing or little to do with what I am trying to >>>>>> match. It just happens to be there for a reason unrelated to these >>>>>> rules, but it causes the second rule not to match when I expected it. >>>>>> Only the first rule fires, but the second will also fire when I change >>>>>> Covering bounds to [0,7[ though. >>>>>> >>>>>> The order in which elements are matched seems very different from how >>>>>> they are usually selected from the CAS index, where you would get >>>>>> 'Covering Cat Dog Cat’, and with this order you would intuitvely expect >>>>>> both rules to match. This would probably be overly simplified though, >>>>>> since I would not be able to match adjacent covering annotations this >>>>>> way, so I believe matching is somehow based on edge detection. Sill, I >>>>>> have difficulties to understand why that extra covering space makes a >>>>>> difference. >>>>>> >>>>>> I was hoping you could provide me with some details, and I also like to >>>>>> know what possible workaround options I have. I was considering playing >>>>>> around with type filtering, but it would require a bit of >>>>>> adding/removing types to be filtered during the script, so it didn’t >>>>>> seem as the simplest solution. Ensuring that covering always aligns with >>>>>> the end of a token is another possibility in this particular case, but I >>>>>> still need to add general robustness to the Ruta script against these >>>>>> scenarios. Any feedback is mostly appreciated, thanks :) >>>>>> >>>>>> Cheers, >>>>>> Mario >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Dr. Peter Klügl >>>>> R&D Text Mining/Machine Learning >>>>> >>>>> Averbis GmbH >>>>> Salzstr. 15 >>>>> 79098 Freiburg >>>>> Germany >>>>> >>>>> Fon: +49 761 708 394 0 >>>>> Fax: +49 761 708 394 10 >>>>> Email: [email protected] >>>>> Web: https://averbis.com >>>>> >>>>> Headquarters: Freiburg im Breisgau >>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 >>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó >>>>> >>> -- >>> Dr. Peter Klügl >>> R&D Text Mining/Machine Learning >>> >>> Averbis GmbH >>> Salzstr. 15 >>> 79098 Freiburg >>> Germany >>> >>> Fon: +49 761 708 394 0 >>> Fax: +49 761 708 394 10 >>> Email: [email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>> >>> Web: https://averbis.com <https://averbis.com/> <https://averbis.com/ >>> <https://averbis.com/>> >>> >>> Headquarters: Freiburg im Breisgau >>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 >>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
