Hi Kuro, 

Your first message ask for a lemmatization ‘model’ for Portuguese. I don’t have 
right now numbers to support my claim, but I fell like lemmatization 
(morphosyntactic analysis) is best done with a rule-based approach, 
finite-state in particular, with the possible support of a lexical resource. 
For Portuguese, I maintain the MorphoBr

https://github.com/LR-POR/MorphoBr

We are also expanding the rules implemented in http://fomafst.github.io 
<http://fomafst.github.io/> to compact the full-form dictionary and better 
integrate it with the HPSG grammar we are developing. 

A similar approach was adopted by Freeling, 
https://github.com/TALP-UPC/FreeLing. I have collaborated with Lluís Padró 
(Freeling's author) to expand the Portuguese support of it. 

Comments are welcome.

Best,
Alexandre


> On 10 Jan 2023, at 19:43, T. Kuro Kurosaka <k...@bhlab.com> wrote:
> 
> Thank you, Leszek!
> It looks promising. It did lemmatize "azuis" -> "azul".
> Are these 4 char filters absolutely required to run the lemmatizers correctly 
> ?
> 
> Kuro
> 
> On 1/10/23 12:45 AM, lesze...@interia.eu wrote:
>> Hi
>> 
>> As far as I know there is no portugese lemmatizer on official opennlp site.
>> In general such models are not easily available, at least for less popular 
>> languages.
>> 
>> I developed an application to automatically compute sentence-detector, 
>> tokenizer, pos-tagger and lemmatizer from Universal Dependencies language 
>> files.
>> For now models are generated for 19 languages (including portugese).
>> 
>> Main app: https://github.com/abzif/babzel
>> Pre-trained models: https://abzif.github.io/babzel/models.html
>> 
>> Enjoy!
>> Leszek Piotrowicz
>> 
>> Od: "T. Kuro Kurosaka" <k...@bhlab.com>
>> Do: users@opennlp.apache.org;
>> Wysłane: 2:29 Wtorek 2023-01-10
>> Temat: Portuguese lemmatization model?
>> 
>>> Is there a pre-trained lemmatization model for Portuguese and other popular
>>> languages?
>>> 
>>> -- 
>>> T. "Kuro" Kurosaka, Orinda, California, USA
>>> 

Reply via email to