Hello again, On Fri, 13 Jan 2023 at 12:41, Alexandre Rademaker <aradema...@gmail.com> wrote: > > > OpenNLP is mainly machine learning based, but we have the > DictionaryLemmatizer with the ability to pass a dictionary of word forms. See > https://opennlp.apache.org/docs/2.1.0/manual/opennlp.html#tools.lemmatizer.tagging.api. > So you can use the http://github.com/LR-POR/MorphoBr that I mentioned before > to prepare the input file for the DictionaryLemmatizer. >
The main motivation to provide a DictionaryLemmatizer was to sort of be able to post-process (correct) the errors of the statistical model. Note that dictionaries suffer from low coverage, even the large dictionaries from Freeling etc., so the dictionary-based lemmatizer is going to be limited to the entries contained in the dictionary. > The statistical lemmatizer is also available, and that would require a model > to run. You can train yourself or use one already available from the link > provided by Leszek. Our experiments at the time showed that in terms of performance Perceptron was a better choice for lemmatization. It is quite fast and cheap to train a lemmatizer with UD data. Best regards, Rodrigo