Hi, On 21.05.2013 13:49, [email protected] wrote: > Hi Peter, > > I think that the rule doesn't matter. But I tried to find calender dates. To > find out what was going wrong I reduced the original more complex rule to > > DECLARE Date; > Document{->RETAINTYPE(BREAK, SPACE)}; > NUM{REGEXP("\\d\\d")->MARK(Date, 1, 2)} PERIOD; > > on the input text > > 12. Mai 1803 > > I didn't use SourceDocumentInformation in the rule. I was just in the way. > The first token with RutaBasic was NUM(0, 2), that is "12", and the second > token was SourceDocumentInformation(0, 0). So the rule failed which is > correct.
Ah, thanks. I will try to reproduce it and fix it ASAP. Best, Peter > Cheers, > Armin > > -----Ursprüngliche Nachricht----- > Von: Peter Klügl [mailto:[email protected]] > Gesendet: Dienstag, 21. Mai 2013 13:38 > An: [email protected] > Betreff: Re: Ruta - Token Order > > Hi, > > On 21.05.2013 12:47, [email protected] wrote: >> Hi, >> >> In Ruta 2.0.2-SNAPSHOT a token with begin offset 0 and end offset 2 comes >> before a token with begin offset 0 and end offset 0. The token order is not >> as I expected. Thus in my case, SourceDocumentAnnotation was the second >> token in the token sequence and the rule didn't match. It took me some time >> to find that out. The end offset of SourceDocumentAnnotation should better >> be the length of the text. How is the token ordering defined? > Annotations of the length 0 can be problematic in UIMA Ruta due to the > inference mechanism and should be avoided. The reason for this is the > complete disjoint partition of the document represented by the RutaBasic > annotations. If they have the length 0, then the match can be ambiguous. > > The token order should be almost identical to the normal UIMA order and there > should only be a difference for specific types. > The type priorities are: > RutaFrame > Annotation > RutaBasic > > I will take a look at the situation you described. It sounds like a bug for > annotations of the length 0 and should not occur at all. > > May I ask with which rule you tried to match a token and the SDA? > > Best, > > Peter > >> Cheers, >> Armin
