Hi, Am 22.01.2015 um 09:20 schrieb [email protected]: > Hello! > > This a very short and simple gazetteer using RUTA. > > Document{->GREEDYANCHORING(true)}; > %s*{->MARKFAST(%s,'%s')};
First of all, I am sorry that I was not yet able to implement the greedy matching for the gazetteers/wordlists. I have not forgotten it. Just curious: does the rule perform as you expect/intend? I mean the combination of greedy anchoring and the windowed stream caused by the matching condition. > > where the first %s is replaced using String.format() by the name of the source type, the second %s is replaced by the target type name, and the third %s is replaced by the URL of a word list. Doing so, it's a little bit for flexible. This is done once in CasAnnotator_ImplBase.initialize(). > > Then the script is executed with Ruta.apply(cas, script) in process(). But that means that the word list is read again for every CAS processed. Is there any way to have RUTA use the word list as a SharedResourceObject, so that it is read once only? The problem is that Ruta.apply() creates a new descriptor and a new analysis engine. You could integrate the ruta analysis engine in your analysis engine as a field or something and call its process() in your process() method (and initialize()). Then, the worlists should not be reloaded for each process(). As for SharedResourceObject: This should be done, but it was never at the top of my todo list. I hope I will find the time sometime. You maybe want to take a look at UIMA-4062 and UIMA-4074, especially Silvertre's comment on UIMA-4062 (29/Oct/14 19:12) where he loads a table using external resources. Could also work for you maybe. Maybe Silvestre can share his experiences? Best, Peter > > Regards, > Armin
