Hi,
it's now March and I did not yet find the time to compare the different annotators in your benchmark. I just wanted to mention that I did not forget about this and that this is still on my todo list. However, it could easily be April before I find the time. Best, Peter Am 08.12.2016 um 10:43 schrieb Donatas Remeika: > Hi, > > Peter, I did some benchmark on 20 newsgroups texts. The results can be > found here: https://github.com/tokenmill/dictionary-annotator > I didn't measure memory usage, just compared how fast different annotators > do the job. > > Best regards, > Donatas > > On Mon, Dec 5, 2016 at 2:35 PM Peter Klügl <[email protected]> wrote: > >> Hi, >> >> >> for the UIMA Ruta paper, I used the enron email dataset [1], but it is >> probably not optimal here. >> >> >> I think we can find a standard scenario (data+terminology), maybe >> something like Genia with MeSH or wikipedia with geonames. Just a quick >> guess. I can help setting something up, but probably not before February. >> >> >> Best, >> >> >> Peter >> >> >> [1] https://www.cs.cmu.edu/~enron/ >> >> Am 05.12.2016 um 12:56 schrieb Donatas Remeika: >>> Hi, >>> >>> Thanks for feedback. >>> Yes, it would be interesting to see benchmark results. Maybe you know >> where >>> I could find examples and data for doing benchmarks in UIMA? >>> >>> Best regards, >>> Donatas >>> >>> >>> On Mon, Dec 5, 2016 at 10:52 AM Peter Klügl <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> >>>> a very nice annotator, thank you. >>>> >>>> >>>> Do you have figures how the annotator compares to the others with >>>> respect to speed and memory usage? >>>> >>>> Storing the complete tokens will maybe provide challenges in scenarios >>>> with parallelization if the dictionary is not shared between annotators. >>>> >>>> Would you be interested to set up a benchmark? >>>> >>>> >>>> Because of the limitations of the dictionaries in ruta, I also created a >>>> new simple dictionary annotator, but it lives now in our own components >>>> repository. Maybe I'll contribute it sometimes to ruta since it provides >>>> exactly the functionality the ruta dictionaries miss. >>>> >>>> >>>> Best, >>>> >>>> >>>> Peter >>>> >>>> >>>> Am 30.11.2016 um 15:38 schrieb Donatas Remeika: >>>>> Hi, >>>>> >>>>> Just wanted to let you know that we created a new (probably one more) >>>>> dictionary annotator. >>>>> >>>>> Reasons for creating it was: >>>>> - Quite often we used Ruta in our pipelines only because of its >>>> MARKTABLE >>>>> action which is able to set several features on annotation >>>>> - Sometimes dictionaries contain duplicate entries with different >>>> features >>>>> and we need to create annotations for each entry >>>>> - Possibility to use custom dictionary entries tokenizer (default is >>>>> whitespace tokenizer) >>>>> >>>>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE. >>>> Big >>>>> thanks to their developers! >>>>> >>>>> Code with examples can be found >>>>> https://github.com/tokenmill/dictionary-annotator >>>>> >>>>> BTW, maybe someone knows Concept Mapper alternative, which is more >>>> uimaFIT >>>>> friendly? >>>>> >>>>> Best regards, >>>>> Donatas >>>>> >>
