Re: New dictionary annotator

Peter Klügl Tue, 14 Mar 2017 00:52:16 -0700

Hi,


it's now March and I did not yet find the time to compare the different
annotators in your benchmark.


I just wanted to mention that I did not forget about this and that this
is still on my todo list. However, it could easily be April before I
find the time.


Best,


Peter


Am 08.12.2016 um 10:43 schrieb Donatas Remeika:
> Hi,
>
> Peter, I did some benchmark on 20 newsgroups texts. The results can be
> found here: https://github.com/tokenmill/dictionary-annotator
> I didn't measure memory usage, just compared how fast different annotators
> do the job.
>
> Best regards,
> Donatas
>
> On Mon, Dec 5, 2016 at 2:35 PM Peter Klügl <[email protected]> wrote:
>
>> Hi,
>>
>>
>> for the UIMA Ruta paper, I used the enron email dataset [1], but it is
>> probably not optimal here.
>>
>>
>> I think we can find a standard scenario (data+terminology), maybe
>> something like Genia with MeSH or wikipedia with geonames. Just a quick
>> guess. I can help setting something up, but probably not before February.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> [1] https://www.cs.cmu.edu/~enron/
>>
>> Am 05.12.2016 um 12:56 schrieb Donatas Remeika:
>>> Hi,
>>>
>>> Thanks for feedback.
>>> Yes, it would be interesting to see benchmark results. Maybe you know
>> where
>>> I could find examples and data for doing benchmarks in UIMA?
>>>
>>> Best regards,
>>> Donatas
>>>
>>>
>>> On Mon, Dec 5, 2016 at 10:52 AM Peter Klügl <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> a very nice annotator, thank you.
>>>>
>>>>
>>>> Do you have figures how the annotator compares to the others with
>>>> respect to speed and memory usage?
>>>>
>>>> Storing the complete tokens will maybe provide challenges in scenarios
>>>> with parallelization if the dictionary is not shared between annotators.
>>>>
>>>> Would you be interested to set up a benchmark?
>>>>
>>>>
>>>> Because of the limitations of the dictionaries in ruta, I also created a
>>>> new simple dictionary annotator, but it lives now in our own components
>>>> repository. Maybe I'll contribute it sometimes to ruta since it provides
>>>> exactly the functionality the ruta dictionaries miss.
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>> Am 30.11.2016 um 15:38 schrieb Donatas Remeika:
>>>>> Hi,
>>>>>
>>>>> Just wanted to let you know that we created a new (probably one more)
>>>>> dictionary annotator.
>>>>>
>>>>> Reasons for creating it was:
>>>>>  - Quite often we used Ruta in our pipelines only because of its
>>>> MARKTABLE
>>>>> action which is able to set several features on annotation
>>>>>  - Sometimes dictionaries contain duplicate entries with different
>>>> features
>>>>> and we need to create annotations for each entry
>>>>>  - Possibility to use custom dictionary entries tokenizer (default is
>>>>> whitespace tokenizer)
>>>>>
>>>>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
>>>> Big
>>>>> thanks to their developers!
>>>>>
>>>>> Code with examples can be found
>>>>> https://github.com/tokenmill/dictionary-annotator
>>>>>
>>>>> BTW, maybe someone knows Concept Mapper alternative, which is more
>>>> uimaFIT
>>>>> friendly?
>>>>>
>>>>> Best regards,
>>>>> Donatas
>>>>>
>>

Re: New dictionary annotator

Reply via email to