Re: Portuguese lemmatization model?

Alexandre Rademaker Fri, 13 Jan 2023 03:41:33 -0800


OpenNLP is mainly machine learning based, but we have the DictionaryLemmatizer 
with the ability to pass a dictionary of word forms. See 
https://opennlp.apache.org/docs/2.1.0/manual/opennlp.html#tools.lemmatizer.tagging.api.
 So you can use the http://github.com/LR-POR/MorphoBr that I mentioned before 
to prepare the input file for the DictionaryLemmatizer.

The statistical lemmatizer is also available, and that would require a model to 
run. You can train yourself or use one already available from the link provided 
by Leszek.

Rodrigo Agerri made a strong claim saying that supervised lemmatizer works 
better. I don’t want to go into that discussion, but I believe the decision 
about an ML-based (supervised or not) and rule-based approach should be based 
on many more criteria than the performance in a single dataset.

Best,
Alexandre 

> On 13 Jan 2023, at 01:48, T. Kuro Kurosaka <k...@bhlab.com> wrote:
> 
> I wrote "model" just because I did not know openNLP support a rule based 
> approach.
> Are there rule file sthat I can try for Portuguese and other major languages?
> 
> Kuro
>

Re: Portuguese lemmatization model?

Reply via email to