Hi,
for the UIMA Ruta paper, I used the enron email dataset [1], but it is probably not optimal here. I think we can find a standard scenario (data+terminology), maybe something like Genia with MeSH or wikipedia with geonames. Just a quick guess. I can help setting something up, but probably not before February. Best, Peter [1] https://www.cs.cmu.edu/~enron/ Am 05.12.2016 um 12:56 schrieb Donatas Remeika: > Hi, > > Thanks for feedback. > Yes, it would be interesting to see benchmark results. Maybe you know where > I could find examples and data for doing benchmarks in UIMA? > > Best regards, > Donatas > > > On Mon, Dec 5, 2016 at 10:52 AM Peter Klügl <[email protected]> > wrote: > >> Hi, >> >> >> a very nice annotator, thank you. >> >> >> Do you have figures how the annotator compares to the others with >> respect to speed and memory usage? >> >> Storing the complete tokens will maybe provide challenges in scenarios >> with parallelization if the dictionary is not shared between annotators. >> >> Would you be interested to set up a benchmark? >> >> >> Because of the limitations of the dictionaries in ruta, I also created a >> new simple dictionary annotator, but it lives now in our own components >> repository. Maybe I'll contribute it sometimes to ruta since it provides >> exactly the functionality the ruta dictionaries miss. >> >> >> Best, >> >> >> Peter >> >> >> Am 30.11.2016 um 15:38 schrieb Donatas Remeika: >>> Hi, >>> >>> Just wanted to let you know that we created a new (probably one more) >>> dictionary annotator. >>> >>> Reasons for creating it was: >>> - Quite often we used Ruta in our pipelines only because of its >> MARKTABLE >>> action which is able to set several features on annotation >>> - Sometimes dictionaries contain duplicate entries with different >> features >>> and we need to create annotations for each entry >>> - Possibility to use custom dictionary entries tokenizer (default is >>> whitespace tokenizer) >>> >>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE. >> Big >>> thanks to their developers! >>> >>> Code with examples can be found >>> https://github.com/tokenmill/dictionary-annotator >>> >>> BTW, maybe someone knows Concept Mapper alternative, which is more >> uimaFIT >>> friendly? >>> >>> Best regards, >>> Donatas >>> >>
