Re: New dictionary annotator

Hugues de Mazancourt Fri, 02 Dec 2016 01:03:08 -0800

Thanks for this contribution.

Do you have any plan to make the lookup accent-insensitive ? Or any knowledge 
of a component that would do the job ?
I’m currently using ConceptMapper outside of Ruta and MARKTABLE from within 
Ruta but neither performs correctly on accents (btw, conceptMapper is *very* 
slow on resource loading, which can be a problem).


My point is : I have lists containing elements like « événement » and I would 
like text like « EVENEMENT » or even « évènement » to match that list. 
Lowercasing texts is not a solution, as « é » is mapped to uppercase « É » in 
French locale, which has nothing to do with « e ». I guess you have the same 
problem with latvian.

Best,


Hugues de Mazancourt
http://about.me/mazancourt




> Le 30 nov. 2016 à 15:38, Donatas Remeika <[email protected]> a écrit :
> 
> Hi,
> 
> Just wanted to let you know that we created a new (probably one more)
> dictionary annotator.
> 
> Reasons for creating it was:
> - Quite often we used Ruta in our pipelines only because of its MARKTABLE
> action which is able to set several features on annotation
> - Sometimes dictionaries contain duplicate entries with different features
> and we need to create annotations for each entry
> - Possibility to use custom dictionary entries tokenizer (default is
> whitespace tokenizer)
> 
> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE. Big
> thanks to their developers!
> 
> Code with examples can be found
> https://github.com/tokenmill/dictionary-annotator
> 
> BTW, maybe someone knows Concept Mapper alternative, which is more uimaFIT
> friendly?
> 
> Best regards,
> Donatas

Re: New dictionary annotator

Reply via email to