Hi

Maybe I share my experience. In 'babzel' project, models for 19 languages are 
computed for now.
I also tried to compute Arabic language model. Computation of 
sentence-detector, tokenizer, pos-tagger was successful.
Lemmatizer training lasted incredibly long (a few hours) and failed with 
exception like "Serialization error, string too long".
I don't remember exact exception message, but the computed model could not be 
written to file.
So in the end it was not possible for me to publish this model.

Regards
Leszek

Od: "T. Kuro Kurosaka" <k...@bhlab.com>
Do: users@opennlp.apache.org; 
Wysłane: 21:35 Niedziela 2023-02-12
Temat: Re: Portuguese lemmatization model?

> Thank you for responses for my earlier question.
> So far I'm using the models published in babzel project but it doesn't
have one 
> for Arabic.
> Are there any pre-trained lemmatization model of a reasonable accuracy
(95+% ?) 
> available?
> 
> 
> On 1/9/23 5:23 PM, T. Kuro Kurosaka wrote:
> > Is there a pre-trained lemmatization model for Portuguese and other
popular 
> > languages?
> >
> Kuro
> 
> 


Reply via email to