Hello,

What I said is not only about Portuguese and not only for one dataset,
it is about most languages for which we have evaluation data (e.g.
sigmorphon 2019 data for which around 50 languages are evaluated). The
best results nowadays for lemmatization in NLP are obtained by
supervised approaches as it happens for the large majority of NLP
tasks (except when there is a very specific application with a very
ad-hoc fine-tuned model). If you check the state-of-the-art of NLP
tasks, that is the current trend of the field, not my personal claim.

Best regards,

Rodrigo


On Fri, 13 Jan 2023 at 12:41, Alexandre Rademaker <aradema...@gmail.com> wrote:
>
>
> OpenNLP is mainly machine learning based, but we have the 
> DictionaryLemmatizer with the ability to pass a dictionary of word forms. See 
> https://opennlp.apache.org/docs/2.1.0/manual/opennlp.html#tools.lemmatizer.tagging.api.
>  So you can use the http://github.com/LR-POR/MorphoBr that I mentioned before 
> to prepare the input file for the DictionaryLemmatizer.
>
> The statistical lemmatizer is also available, and that would require a model 
> to run. You can train yourself or use one already available from the link 
> provided by Leszek.
>
> Rodrigo Agerri made a strong claim saying that supervised lemmatizer works 
> better. I don’t want to go into that discussion, but I believe the decision 
> about an ML-based (supervised or not) and rule-based approach should be based 
> on many more criteria than the performance in a single dataset.
>
> Best,
> Alexandre
>
>
> > On 13 Jan 2023, at 01:48, T. Kuro Kurosaka <k...@bhlab.com> wrote:
> >
> > I wrote "model" just because I did not know openNLP support a rule based 
> > approach.
> > Are there rule file sthat I can try for Portuguese and other major 
> > languages?
> >
> > Kuro
> >
>

Reply via email to