Re: RUTA and shared resources

Peter Klügl Thu, 22 Jan 2015 02:26:12 -0800

Hi,

Am 22.01.2015 um 09:20 schrieb [email protected]:
> Hello!
>
> This a very short and simple gazetteer using RUTA.
>
> Document{->GREEDYANCHORING(true)};
> %s*{->MARKFAST(%s,'%s')};


First of all, I am sorry that I was not yet able to implement the greedy
matching for the gazetteers/wordlists. I have not forgotten it.
Just curious: does the rule perform as you expect/intend? I mean the
combination of greedy anchoring and the windowed stream caused by the
matching condition.


>
> where the first %s is replaced using String.format() by the name of
the source type, the second %s is replaced by the target type name, and
the third %s is replaced by the URL of a word list. Doing so, it's a
little bit for flexible. This is done once in
CasAnnotator_ImplBase.initialize().
>
> Then the script is executed with Ruta.apply(cas, script) in process().
But that means that the word list is read again for every CAS processed.
Is there any way to have RUTA use the word list as a
SharedResourceObject, so that it is read once only?

The problem is that Ruta.apply() creates a new descriptor and a new
analysis engine. You could integrate the ruta analysis engine in your
analysis engine as a field or something and call its process() in your
process() method (and initialize()). Then, the worlists should not be
reloaded for each process().

As for SharedResourceObject: This should be done, but it was never at
the top of my todo list. I hope I will find the time sometime.

You maybe want to take a look at UIMA-4062 and UIMA-4074, especially
Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a
table using external resources. Could also work for you maybe. Maybe
Silvestre can share his experiences?

Best,

Peter

>
> Regards,
> Armin

Re: RUTA and shared resources

Reply via email to