Hi Kuro,
Your first message ask for a lemmatization ‘model’ for Portuguese. I don’t have right now numbers to support my claim, but I fell like lemmatization (morphosyntactic analysis) is best done with a rule-based approach, finite-state in particular, with the possible support of a lexical resource. For Portuguese, I maintain the MorphoBr https://github.com/LR-POR/MorphoBr We are also expanding the rules implemented in http://fomafst.github.io <http://fomafst.github.io/> to compact the full-form dictionary and better integrate it with the HPSG grammar we are developing. A similar approach was adopted by Freeling, https://github.com/TALP-UPC/FreeLing. I have collaborated with Lluís Padró (Freeling's author) to expand the Portuguese support of it. Comments are welcome. Best, Alexandre > On 10 Jan 2023, at 19:43, T. Kuro Kurosaka <k...@bhlab.com> wrote: > > Thank you, Leszek! > It looks promising. It did lemmatize "azuis" -> "azul". > Are these 4 char filters absolutely required to run the lemmatizers correctly > ? > > Kuro > > On 1/10/23 12:45 AM, lesze...@interia.eu wrote: >> Hi >> >> As far as I know there is no portugese lemmatizer on official opennlp site. >> In general such models are not easily available, at least for less popular >> languages. >> >> I developed an application to automatically compute sentence-detector, >> tokenizer, pos-tagger and lemmatizer from Universal Dependencies language >> files. >> For now models are generated for 19 languages (including portugese). >> >> Main app: https://github.com/abzif/babzel >> Pre-trained models: https://abzif.github.io/babzel/models.html >> >> Enjoy! >> Leszek Piotrowicz >> >> Od: "T. Kuro Kurosaka" <k...@bhlab.com> >> Do: users@opennlp.apache.org; >> Wysłane: 2:29 Wtorek 2023-01-10 >> Temat: Portuguese lemmatization model? >> >>> Is there a pre-trained lemmatization model for Portuguese and other popular >>> languages? >>> >>> -- >>> T. "Kuro" Kurosaka, Orinda, California, USA >>>