Re: Ruta - MARKFAST

Peter Klügl Mon, 30 Jun 2014 03:51:27 -0700

Hi,

Am 30.06.2014 11:32, schrieb [email protected]:
> Hello!
>
> On which annotation type does MARFKAST work?


It is applied on the annotations, on which the rule element of the
action matched.

Document{-> MARKFAST(...)};
... causes a dictionary lookup on the complete document.

Sentence{CONTAINS(...) -> MARKFAST(...)};
... causes a separate dictionary lookup on each of the matched sentences
(e.g., no inter-sentence annotations).


> Can I restrict MARKFAST to a single annotation Type, say my own token type?

No, but there is an issue that includes this functionality.

UIMA-3775: Fast multi token dictionary matching on feature values

The idea is the apply the dictionary lookup on sequences feature values
(e.g., lemmas). If the feature represents the covered text, then this
would also support your use case. The issue is not top priority right
now, but if you want, then I can try to include it in the next release
(August).

> It would be nice to restrict a ruta script to a set of annotations by giving 
> that set of annotations
explicitly, like
>
> Document{-> INPUT(Token, Organization, Location)};

UIMA Ruta follows a different strategy, e.g., compared to JAPE and its
input specification. The availability and visibility of annotations is
not type-based but coverage-based. This enables the easy specification
of complex patterns, but also complicates the things sometimes. If one
type is set to invisible (FILTERTYPE), then all annotations of this type
and all covered annotations of other types are invisible.

The MARKFAST action operates on the RutaStream and thus is lookup is
sensitive to the filtering setting. For example, the lookup ignored
whitespaces, breaks and markup using the default settings. By extending
the set of filtered types, you can also change the behavior of the
dictionary lookup. However, mind that annotations covered by one of the
types are also not accessible by the dictionary.

>
> All other annotations should be ignored. Is there a way to do this in
Ruta? Can this by done with FILTERTYPE and RETAINTYPE? How?

Yes, but it depends on the actual occurrences of types in your document.
The easiest way is to filter the types of the annotations that cover the
positions that should be skipped. It's not easy to give a generic
solution for this.

An example:
Your tokenizer creates annotations for words and numbers, but not for
punctuation marks, and you want to apply the dictionary lookup only for
sequences of token annotations skipping punctuation marks.

Document{-> FILTERTYPE(PM)};
Document{-> MARKFAST(...)};


There are plans to extend and modify the concept of accessibility and
visibility in UIMA Ruta sometime (>= 3.0.0). Any wishes and opinions are
welcome :-)



Best,

Peter


>
>
> Cheers,
> Armin
>

Re: Ruta - MARKFAST

Reply via email to