Hi,

On 21.05.2013 13:52, Peter Klügl wrote:
> Hi,
>
> On 21.05.2013 13:49, [email protected] wrote:
>> Hi Peter,
>>
>> I think that the rule doesn't matter. But I tried to find calender dates. To 
>> find out what was going wrong I reduced the original more complex rule to
>>  
>> DECLARE Date;
>> Document{->RETAINTYPE(BREAK, SPACE)};
>> NUM{REGEXP("\\d\\d")->MARK(Date, 1, 2)} PERIOD;
>>
>> on the input text
>>
>> 12. Mai 1803
>>
>> I didn't use SourceDocumentInformation in the rule. I was just in the way. 
>> The first token with RutaBasic was NUM(0, 2), that is "12", and the second 
>> token was SourceDocumentInformation(0, 0). So the rule failed which is 
>> correct.
> Ah, thanks. I will try to reproduce it and fix it ASAP.

Thanks for reporting this. I should improve the explanation component.
This is a really nasty missed match, no explanation at all.

It's fixed in the trunk.

Best,

Peter


> Best,
>
> Peter
>
>
>> Cheers,
>> Armin
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Peter Klügl [mailto:[email protected]] 
>> Gesendet: Dienstag, 21. Mai 2013 13:38
>> An: [email protected]
>> Betreff: Re: Ruta - Token Order
>>
>> Hi,
>>
>> On 21.05.2013 12:47, [email protected] wrote:
>>> Hi,
>>>
>>> In Ruta 2.0.2-SNAPSHOT a token with begin offset 0 and end offset 2 comes 
>>> before a token with begin offset 0 and end offset 0. The token order is not 
>>> as I expected. Thus in my case, SourceDocumentAnnotation was the second 
>>> token in the token sequence and the rule didn't match. It took me some time 
>>> to find that out. The end offset of SourceDocumentAnnotation should better 
>>> be the length of the text. How is the token ordering defined?
>> Annotations of the length 0 can be problematic in UIMA Ruta due to the 
>> inference mechanism and should be avoided. The reason for this is the 
>> complete disjoint partition of the document represented by the RutaBasic 
>> annotations. If they have the length 0, then the match can be ambiguous.
>>
>> The token order should be almost identical to the normal UIMA order and 
>> there should only be a difference for specific types.
>> The type priorities are:
>> RutaFrame
>> Annotation
>> RutaBasic
>>
>> I will take a look at the situation you described. It sounds like a bug for 
>> annotations of the length 0 and should not occur at all.
>>
>> May I ask with which rule you tried to match a token and the SDA?
>>
>> Best,
>>
>> Peter
>>
>>> Cheers,
>>> Armin

Reply via email to