Hi,
Peter, I did some benchmark on 20 newsgroups texts. The results can be
found here: https://github.com/tokenmill/dictionary-annotator
I didn't measure memory usage, just compared how fast different annotators
do the job.
Best regards,
Donatas
On Mon, Dec 5, 2016 at 2:35 PM Peter Klügl wrote:
> Hi,
>
>
> for the UIMA Ruta paper, I used the enron email dataset [1], but it is
> probably not optimal here.
>
>
> I think we can find a standard scenario (data+terminology), maybe
> something like Genia with MeSH or wikipedia with geonames. Just a quick
> guess. I can help setting something up, but probably not before February.
>
>
> Best,
>
>
> Peter
>
>
> [1] https://www.cs.cmu.edu/~enron/
>
> Am 05.12.2016 um 12:56 schrieb Donatas Remeika:
> > Hi,
> >
> > Thanks for feedback.
> > Yes, it would be interesting to see benchmark results. Maybe you know
> where
> > I could find examples and data for doing benchmarks in UIMA?
> >
> > Best regards,
> > Donatas
> >
> >
> > On Mon, Dec 5, 2016 at 10:52 AM Peter Klügl
> > wrote:
> >
> >> Hi,
> >>
> >>
> >> a very nice annotator, thank you.
> >>
> >>
> >> Do you have figures how the annotator compares to the others with
> >> respect to speed and memory usage?
> >>
> >> Storing the complete tokens will maybe provide challenges in scenarios
> >> with parallelization if the dictionary is not shared between annotators.
> >>
> >> Would you be interested to set up a benchmark?
> >>
> >>
> >> Because of the limitations of the dictionaries in ruta, I also created a
> >> new simple dictionary annotator, but it lives now in our own components
> >> repository. Maybe I'll contribute it sometimes to ruta since it provides
> >> exactly the functionality the ruta dictionaries miss.
> >>
> >>
> >> Best,
> >>
> >>
> >> Peter
> >>
> >>
> >> Am 30.11.2016 um 15:38 schrieb Donatas Remeika:
> >>> Hi,
> >>>
> >>> Just wanted to let you know that we created a new (probably one more)
> >>> dictionary annotator.
> >>>
> >>> Reasons for creating it was:
> >>> - Quite often we used Ruta in our pipelines only because of its
> >> MARKTABLE
> >>> action which is able to set several features on annotation
> >>> - Sometimes dictionaries contain duplicate entries with different
> >> features
> >>> and we need to create annotations for each entry
> >>> - Possibility to use custom dictionary entries tokenizer (default is
> >>> whitespace tokenizer)
> >>>
> >>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
> >> Big
> >>> thanks to their developers!
> >>>
> >>> Code with examples can be found
> >>> https://github.com/tokenmill/dictionary-annotator
> >>>
> >>> BTW, maybe someone knows Concept Mapper alternative, which is more
> >> uimaFIT
> >>> friendly?
> >>>
> >>> Best regards,
> >>> Donatas
> >>>
> >>
>
>