Hello, Lemmatization works best in a supervised approach, currently neural models, unless you have an extremely sophisticated approach for a specific language.
https://aclanthology.org/W19-4226/ In OpenNLP we implemented Chrupala's approach (it performs better than Freeling for Spanish and Catalan): https://doras.dcu.ie/15272/ https://doras.dcu.ie/550/ as interpreted here: https://github.com/ixa-ehu/ixa-pipe-pos Best regards, Rodrigo On Wed, 11 Jan 2023 at 12:04, Alexandre Rademaker <aradema...@gmail.com> wrote: > > > Hi Kuro, > > Your first message ask for a lemmatization ‘model’ for Portuguese. I don’t > have right now numbers to support my claim, but I fell like lemmatization > (morphosyntactic analysis) is best done with a rule-based approach, > finite-state in particular, with the possible support of a lexical resource. > For Portuguese, I maintain the MorphoBr > > https://github.com/LR-POR/MorphoBr > > We are also expanding the rules implemented in http://fomafst.github.io > <http://fomafst.github.io/> to compact the full-form dictionary and better > integrate it with the HPSG grammar we are developing. > > A similar approach was adopted by Freeling, > https://github.com/TALP-UPC/FreeLing. I have collaborated with Lluís Padró > (Freeling's author) to expand the Portuguese support of it. > > Comments are welcome. > > Best, > Alexandre > > > > On 10 Jan 2023, at 19:43, T. Kuro Kurosaka <k...@bhlab.com> wrote: > > > > Thank you, Leszek! > > It looks promising. It did lemmatize "azuis" -> "azul". > > Are these 4 char filters absolutely required to run the lemmatizers > > correctly ? > > > > Kuro > > > > On 1/10/23 12:45 AM, lesze...@interia.eu wrote: > >> Hi > >> > >> As far as I know there is no portugese lemmatizer on official opennlp site. > >> In general such models are not easily available, at least for less popular > >> languages. > >> > >> I developed an application to automatically compute sentence-detector, > >> tokenizer, pos-tagger and lemmatizer from Universal Dependencies language > >> files. > >> For now models are generated for 19 languages (including portugese). > >> > >> Main app: https://github.com/abzif/babzel > >> Pre-trained models: https://abzif.github.io/babzel/models.html > >> > >> Enjoy! > >> Leszek Piotrowicz > >> > >> Od: "T. Kuro Kurosaka" <k...@bhlab.com> > >> Do: users@opennlp.apache.org; > >> Wysłane: 2:29 Wtorek 2023-01-10 > >> Temat: Portuguese lemmatization model? > >> > >>> Is there a pre-trained lemmatization model for Portuguese and other > >>> popular > >>> languages? > >>> > >>> -- > >>> T. "Kuro" Kurosaka, Orinda, California, USA > >>> >