Hi Hugues, Thanks for feedback. Indeed accent-insensitive matching is a needed feature. Will implement it in a near future.
Best regards, Donatas Remeika On Fri, Dec 2, 2016 at 11:02 AM Hugues de Mazancourt <[email protected]> wrote: > Thanks for this contribution. > > Do you have any plan to make the lookup accent-insensitive ? Or any > knowledge of a component that would do the job ? > I’m currently using ConceptMapper outside of Ruta and MARKTABLE from > within Ruta but neither performs correctly on accents (btw, conceptMapper > is *very* slow on resource loading, which can be a problem). > > My point is : I have lists containing elements like « événement » and I > would like text like « EVENEMENT » or even « évènement » to match that > list. Lowercasing texts is not a solution, as « é » is mapped to uppercase > « É » in French locale, which has nothing to do with « e ». I guess you > have the same problem with latvian. > > Best, > > > Hugues de Mazancourt > http://about.me/mazancourt > > > > > > Le 30 nov. 2016 à 15:38, Donatas Remeika <[email protected]> a > écrit : > > > > Hi, > > > > Just wanted to let you know that we created a new (probably one more) > > dictionary annotator. > > > > Reasons for creating it was: > > - Quite often we used Ruta in our pipelines only because of its MARKTABLE > > action which is able to set several features on annotation > > - Sometimes dictionaries contain duplicate entries with different > features > > and we need to create annotations for each entry > > - Possibility to use custom dictionary entries tokenizer (default is > > whitespace tokenizer) > > > > It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE. > Big > > thanks to their developers! > > > > Code with examples can be found > > https://github.com/tokenmill/dictionary-annotator > > > > BTW, maybe someone knows Concept Mapper alternative, which is more > uimaFIT > > friendly? > > > > Best regards, > > Donatas > >
