The Apache OpenNLP team is pleased to announce the release of pre-trained 
models for 32 languages, based on Universal Dependencies (UD) treebanks.

The Apache OpenNLP library is a machine learning based toolkit for the 
processing of natural language text.

Changes in this version:
- New pre-trained sentence detection, tokenization, parts of speech tagging, 
and lemmatization models for 9 languages are now available for: Armenian, 
Basque, Catalan, Georgian, Greek, Kazakh, Korean, Icelandic, and Turkish.
- The existing sentence detection, tokenization, and parts of speech tagging 
models for the 23 languages, published with models release 1.1, have been 
re-trained. 
- In addition, new lemmatization models have been trained and added for all 
languages.

All models, for a total of 32 languages, were trained with OpenNLP 2.5.0 based 
on the latest UD release 2.15
The models are compatible with Apache OpenNLP >=1.0.0.

Apache OpenNLP model and reports are available for download from our model 
download page:
https://opennlp.apache.org/models.html

More information about this release can be found in the README at:
https://dist.apache.org/repos/dist/release/opennlp/models/ud-models-1.2/README

Details about this model effectiveness can be found in the following report:
https://dist.apache.org/repos/dist/release/opennlp/models/ud-models-1.2/opennlp-training-eval-logs-1.2-2.5.0.zip


The Apache OpenNLP Team

Reply via email to