Hi, On 21.05.2013 13:52, Peter Klügl wrote: > Hi, > > On 21.05.2013 13:49, [email protected] wrote: >> Hi Peter, >> >> I think that the rule doesn't matter. But I tried to find calender dates. To >> find out what was going wrong I reduced the original more complex rule to >> >> DECLARE Date; >> Document{->RETAINTYPE(BREAK, SPACE)}; >> NUM{REGEXP("\\d\\d")->MARK(Date, 1, 2)} PERIOD; >> >> on the input text >> >> 12. Mai 1803 >> >> I didn't use SourceDocumentInformation in the rule. I was just in the way. >> The first token with RutaBasic was NUM(0, 2), that is "12", and the second >> token was SourceDocumentInformation(0, 0). So the rule failed which is >> correct. > Ah, thanks. I will try to reproduce it and fix it ASAP.
Thanks for reporting this. I should improve the explanation component. This is a really nasty missed match, no explanation at all. It's fixed in the trunk. Best, Peter > Best, > > Peter > > >> Cheers, >> Armin >> >> -----Ursprüngliche Nachricht----- >> Von: Peter Klügl [mailto:[email protected]] >> Gesendet: Dienstag, 21. Mai 2013 13:38 >> An: [email protected] >> Betreff: Re: Ruta - Token Order >> >> Hi, >> >> On 21.05.2013 12:47, [email protected] wrote: >>> Hi, >>> >>> In Ruta 2.0.2-SNAPSHOT a token with begin offset 0 and end offset 2 comes >>> before a token with begin offset 0 and end offset 0. The token order is not >>> as I expected. Thus in my case, SourceDocumentAnnotation was the second >>> token in the token sequence and the rule didn't match. It took me some time >>> to find that out. The end offset of SourceDocumentAnnotation should better >>> be the length of the text. How is the token ordering defined? >> Annotations of the length 0 can be problematic in UIMA Ruta due to the >> inference mechanism and should be avoided. The reason for this is the >> complete disjoint partition of the document represented by the RutaBasic >> annotations. If they have the length 0, then the match can be ambiguous. >> >> The token order should be almost identical to the normal UIMA order and >> there should only be a difference for specific types. >> The type priorities are: >> RutaFrame >> Annotation >> RutaBasic >> >> I will take a look at the situation you described. It sounds like a bug for >> annotations of the length 0 and should not occur at all. >> >> May I ask with which rule you tried to match a token and the SDA? >> >> Best, >> >> Peter >> >>> Cheers, >>> Armin
