Hello,

Lemmatization works best in a supervised approach, currently neural
models, unless you have an extremely sophisticated approach for a
specific language.

https://aclanthology.org/W19-4226/

In OpenNLP we implemented Chrupala's approach (it performs better than
Freeling for Spanish and Catalan):

https://doras.dcu.ie/15272/

https://doras.dcu.ie/550/

as interpreted here:

https://github.com/ixa-ehu/ixa-pipe-pos

Best regards,

Rodrigo


On Wed, 11 Jan 2023 at 12:04, Alexandre Rademaker <aradema...@gmail.com> wrote:
>
>
> Hi Kuro,
>
> Your first message ask for a lemmatization ‘model’ for Portuguese. I don’t 
> have right now numbers to support my claim, but I fell like lemmatization 
> (morphosyntactic analysis) is best done with a rule-based approach, 
> finite-state in particular, with the possible support of a lexical resource. 
> For Portuguese, I maintain the MorphoBr
>
> https://github.com/LR-POR/MorphoBr
>
> We are also expanding the rules implemented in http://fomafst.github.io 
> <http://fomafst.github.io/> to compact the full-form dictionary and better 
> integrate it with the HPSG grammar we are developing.
>
> A similar approach was adopted by Freeling, 
> https://github.com/TALP-UPC/FreeLing. I have collaborated with Lluís Padró 
> (Freeling's author) to expand the Portuguese support of it.
>
> Comments are welcome.
>
> Best,
> Alexandre
>
>
> > On 10 Jan 2023, at 19:43, T. Kuro Kurosaka <k...@bhlab.com> wrote:
> >
> > Thank you, Leszek!
> > It looks promising. It did lemmatize "azuis" -> "azul".
> > Are these 4 char filters absolutely required to run the lemmatizers 
> > correctly ?
> >
> > Kuro
> >
> > On 1/10/23 12:45 AM, lesze...@interia.eu wrote:
> >> Hi
> >>
> >> As far as I know there is no portugese lemmatizer on official opennlp site.
> >> In general such models are not easily available, at least for less popular 
> >> languages.
> >>
> >> I developed an application to automatically compute sentence-detector, 
> >> tokenizer, pos-tagger and lemmatizer from Universal Dependencies language 
> >> files.
> >> For now models are generated for 19 languages (including portugese).
> >>
> >> Main app: https://github.com/abzif/babzel
> >> Pre-trained models: https://abzif.github.io/babzel/models.html
> >>
> >> Enjoy!
> >> Leszek Piotrowicz
> >>
> >> Od: "T. Kuro Kurosaka" <k...@bhlab.com>
> >> Do: users@opennlp.apache.org;
> >> Wysłane: 2:29 Wtorek 2023-01-10
> >> Temat: Portuguese lemmatization model?
> >>
> >>> Is there a pre-trained lemmatization model for Portuguese and other 
> >>> popular
> >>> languages?
> >>>
> >>> --
> >>> T. "Kuro" Kurosaka, Orinda, California, USA
> >>>
>

Reply via email to