Re: Portuguese lemmatization model?

Rodrigo Agerri Fri, 13 Jan 2023 04:01:38 -0800

Hello again,

On Fri, 13 Jan 2023 at 12:41, Alexandre Rademaker <aradema...@gmail.com> wrote:
>
>
> OpenNLP is mainly machine learning based, but we have the 
> DictionaryLemmatizer with the ability to pass a dictionary of word forms. See 
> https://opennlp.apache.org/docs/2.1.0/manual/opennlp.html#tools.lemmatizer.tagging.api.
>  So you can use the http://github.com/LR-POR/MorphoBr that I mentioned before 
> to prepare the input file for the DictionaryLemmatizer.
>


The main motivation to provide a DictionaryLemmatizer was to sort of
be able to post-process (correct) the errors of the statistical model.
Note that dictionaries suffer from low coverage, even the large
dictionaries from Freeling etc., so the dictionary-based lemmatizer is
going to be limited to the entries contained in the dictionary.

> The statistical lemmatizer is also available, and that would require a model 
> to run. You can train yourself or use one already available from the link 
> provided by Leszek.

Our experiments at the time showed that in terms of performance
Perceptron was a better choice for lemmatization. It is quite fast and
cheap to train a lemmatizer with UD data.

Best regards,

Rodrigo

Re: Portuguese lemmatization model?

Reply via email to