Hi

You have to do these 4 phases in order, because lemmatizations needs tokens + 
their part-of-speech to do the process

sentence-detection
tokenization
pos-tagging
lemmatization

Theoretically it is possible to do lemmatization using opennlp model and other 
phases in a different way (some hard-coded algorithm?), but I think the 
simplest way is to use 4 opennlp models if they are already precomputed.

Regards
Leszek 

Od: "T. Kuro Kurosaka" <k...@bhlab.com>
Do: users@opennlp.apache.org; lesze...@interia.eu; 
Wysłane: 20:53 Wtorek 2023-01-10
Temat: Re: Portuguese lemmatization model?

> Thank you, Leszek!
> It looks promising. It did lemmatize "azuis" -> "azul".
> Are these 4 char filters absolutely required to run the lemmatizers
correctly ?
> 
> Kuro
> 
> On 1/10/23 12:45 AM, lesze...@interia.eu wrote:
> > Hi
> >
> > As far as I know there is no portugese lemmatizer on official
opennlp site.
> > In general such models are not easily available, at least for less
popular languages.
> >
> > I developed an application to automatically compute
sentence-detector, tokenizer, pos-tagger and lemmatizer from Universal
Dependencies language files.
> > For now models are generated for 19 languages (including portugese).
> >
> > Main app: https://github.com/abzif/babzel
> > Pre-trained models: https://abzif.github.io/babzel/models.html
> >
> > Enjoy!
> > Leszek Piotrowicz
> >
> > Od: "T. Kuro Kurosaka" 
> > Do: users@opennlp.apache.org;
> > Wysłane: 2:29 Wtorek 2023-01-10
> > Temat: Portuguese lemmatization model?
> >
> >> Is there a pre-trained lemmatization model for Portuguese
> > and other popular
> >> languages?
> >>
> >> -- 
> >> T. "Kuro" Kurosaka, Orinda, California, USA
> >>
> >>
> >
> >
> 
> -- 
> T. "Kuro" Kurosaka, Orinda, California, USA
> 
> 


Reply via email to