Hi Maybe I share my experience. In 'babzel' project, models for 19 languages are computed for now. I also tried to compute Arabic language model. Computation of sentence-detector, tokenizer, pos-tagger was successful. Lemmatizer training lasted incredibly long (a few hours) and failed with exception like "Serialization error, string too long". I don't remember exact exception message, but the computed model could not be written to file. So in the end it was not possible for me to publish this model.
Regards Leszek Od: "T. Kuro Kurosaka" <k...@bhlab.com> Do: users@opennlp.apache.org; Wysłane: 21:35 Niedziela 2023-02-12 Temat: Re: Portuguese lemmatization model? > Thank you for responses for my earlier question. > So far I'm using the models published in babzel project but it doesn't have one > for Arabic. > Are there any pre-trained lemmatization model of a reasonable accuracy (95+% ?) > available? > > > On 1/9/23 5:23 PM, T. Kuro Kurosaka wrote: > > Is there a pre-trained lemmatization model for Portuguese and other popular > > languages? > > > Kuro > >