Re: Ruta - Token Order

Thilo Goetz Tue, 21 May 2013 07:16:18 -0700

On 05/21/2013 01:37 PM, Peter Klügl wrote:

Hi,


On 21.05.2013 12:47, [email protected] wrote:

Hi,

In Ruta 2.0.2-SNAPSHOT a token with begin offset 0 and end offset 2 comes 
before a token with begin offset 0 and end offset 0. The token order is not as 
I expected. Thus in my case, SourceDocumentAnnotation was the second token in 
the token sequence and the rule didn't match. It took me some time to find that 
out. The end offset of SourceDocumentAnnotation should better be the length of 
the text. How is the token ordering defined?


Annotations of the length 0 can be problematic in UIMA Ruta due to the
inference mechanism and should be avoided. The reason for this is the
complete disjoint partition of the document represented by the RutaBasic
annotations. If they have the length 0, then the match can be ambiguous.

The token order should be almost identical to the normal UIMA order and
there should only be a difference for specific types.

That is the normal order. Longer annotations that start at the sameposition come before shorter ones. Whether you agree with thisdecisions is another matter ;-)

The type priorities are:
RutaFrame
Annotation
RutaBasic

I will take a look at the situation you described. It sounds like a bug
for annotations of the length 0 and should not occur at all.

May I ask with which rule you tried to match a token and the SDA?

Best,

Peter

Cheers,
Armin

Re: Ruta - Token Order

Reply via email to