Hi Daniel, Dictionary annotator is definitely faster than Concept Mapper, but has much less functionality. It supports only first matching strategy.
Regards, Donatas On Wed, May 10, 2017 at 12:19 AM Daniel Heinze <[email protected]> wrote: > Hi... I just pulled and compiled the dictionaryannotator and am looking > through the code. I'm looking for something that is faster than UIMA > Concept-Mapper. I don't need all the functionality of Concept-Mapper, but > do need the following: > * match all, e.g. if dict entries are "a b c", "a b" and "b c" and input > is "a b c" , I need to match "a b c", "a b" and "b c" > * skip tokens, e.g. if dict entry is "a c d", it should match on input "a > b c d" > Can someone familiar with the new dictionary annotator save me some time > and say if it supports these matching strategies? > Also, any sense of how the system scales? > Thanks / Dan > > -----Original Message----- > From: Peter Klügl [mailto:[email protected]] > Sent: Tuesday, March 14, 2017 12:52 AM > To: [email protected] > Subject: Re: New dictionary annotator > > Hi, > > > it's now March and I did not yet find the time to compare the different > annotators in your benchmark. > > > I just wanted to mention that I did not forget about this and that this is > still on my todo list. However, it could easily be April before I find the > time. > > > Best, > > > Peter > > > Am 08.12.2016 um 10:43 schrieb Donatas Remeika: > > Hi, > > > > Peter, I did some benchmark on 20 newsgroups texts. The results can be > > found here: https://github.com/tokenmill/dictionary-annotator > > I didn't measure memory usage, just compared how fast different > > annotators do the job. > > > > Best regards, > > Donatas > > > > On Mon, Dec 5, 2016 at 2:35 PM Peter Klügl <[email protected]> > wrote: > > > >> Hi, > >> > >> > >> for the UIMA Ruta paper, I used the enron email dataset [1], but it > >> is probably not optimal here. > >> > >> > >> I think we can find a standard scenario (data+terminology), maybe > >> something like Genia with MeSH or wikipedia with geonames. Just a > >> quick guess. I can help setting something up, but probably not before > February. > >> > >> > >> Best, > >> > >> > >> Peter > >> > >> > >> [1] https://www.cs.cmu.edu/~enron/ > >> > >> Am 05.12.2016 um 12:56 schrieb Donatas Remeika: > >>> Hi, > >>> > >>> Thanks for feedback. > >>> Yes, it would be interesting to see benchmark results. Maybe you > >>> know > >> where > >>> I could find examples and data for doing benchmarks in UIMA? > >>> > >>> Best regards, > >>> Donatas > >>> > >>> > >>> On Mon, Dec 5, 2016 at 10:52 AM Peter Klügl > >>> <[email protected]> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> > >>>> a very nice annotator, thank you. > >>>> > >>>> > >>>> Do you have figures how the annotator compares to the others with > >>>> respect to speed and memory usage? > >>>> > >>>> Storing the complete tokens will maybe provide challenges in > >>>> scenarios with parallelization if the dictionary is not shared > between annotators. > >>>> > >>>> Would you be interested to set up a benchmark? > >>>> > >>>> > >>>> Because of the limitations of the dictionaries in ruta, I also > >>>> created a new simple dictionary annotator, but it lives now in our > >>>> own components repository. Maybe I'll contribute it sometimes to > >>>> ruta since it provides exactly the functionality the ruta > dictionaries miss. > >>>> > >>>> > >>>> Best, > >>>> > >>>> > >>>> Peter > >>>> > >>>> > >>>> Am 30.11.2016 um 15:38 schrieb Donatas Remeika: > >>>>> Hi, > >>>>> > >>>>> Just wanted to let you know that we created a new (probably one > >>>>> more) dictionary annotator. > >>>>> > >>>>> Reasons for creating it was: > >>>>> - Quite often we used Ruta in our pipelines only because of its > >>>> MARKTABLE > >>>>> action which is able to set several features on annotation > >>>>> - Sometimes dictionaries contain duplicate entries with different > >>>> features > >>>>> and we need to create annotations for each entry > >>>>> - Possibility to use custom dictionary entries tokenizer (default > >>>>> is whitespace tokenizer) > >>>>> > >>>>> It was inspired by both DKPro dictionary-annotator and Ruta > MARKTABLE. > >>>> Big > >>>>> thanks to their developers! > >>>>> > >>>>> Code with examples can be found > >>>>> https://github.com/tokenmill/dictionary-annotator > >>>>> > >>>>> BTW, maybe someone knows Concept Mapper alternative, which is more > >>>> uimaFIT > >>>>> friendly? > >>>>> > >>>>> Best regards, > >>>>> Donatas > >>>>> > >> > >
