Thanks for this contribution. Do you have any plan to make the lookup accent-insensitive ? Or any knowledge of a component that would do the job ? I’m currently using ConceptMapper outside of Ruta and MARKTABLE from within Ruta but neither performs correctly on accents (btw, conceptMapper is *very* slow on resource loading, which can be a problem).
My point is : I have lists containing elements like « événement » and I would like text like « EVENEMENT » or even « évènement » to match that list. Lowercasing texts is not a solution, as « é » is mapped to uppercase « É » in French locale, which has nothing to do with « e ». I guess you have the same problem with latvian. Best, Hugues de Mazancourt http://about.me/mazancourt > Le 30 nov. 2016 à 15:38, Donatas Remeika <[email protected]> a écrit : > > Hi, > > Just wanted to let you know that we created a new (probably one more) > dictionary annotator. > > Reasons for creating it was: > - Quite often we used Ruta in our pipelines only because of its MARKTABLE > action which is able to set several features on annotation > - Sometimes dictionaries contain duplicate entries with different features > and we need to create annotations for each entry > - Possibility to use custom dictionary entries tokenizer (default is > whitespace tokenizer) > > It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE. Big > thanks to their developers! > > Code with examples can be found > https://github.com/tokenmill/dictionary-annotator > > BTW, maybe someone knows Concept Mapper alternative, which is more uimaFIT > friendly? > > Best regards, > Donatas
