Re: Distributing our statistical models

Jörn Kottmann Thu, 20 Jan 2011 03:16:25 -0800

On 1/20/11 11:43 AM, Olivier Grisel wrote:

2011/1/20 Jörn Kottmann<[email protected]>:

On 1/20/11 12:40 AM, Olivier Grisel wrote:

It's useful even for production: very few people know about training
custom statistical NLP models and will go with the default. And those
who know about custom models will setup their application allow the
use of the default models from the classpath if none is configured in
an application specific configuration file or runtime setup such as
OSGi for instance.

Yes, I am already convinced it is a nice idea, and does not even require
a change to our source code :)


Do you want to open a jira issue? It would be nice to get
some help with the maven details.

Here: https://issues.apache.org/jira/browse/OPENNLP-68

Can you open a jira issue for the open corpora building tools and how
to formalize the process of semi-automatically building new
statistical models to be distributed as part of the regular OpenNLP
release process?


Most of the training support tools are part of opennlp already,
e.g. the formats package, the detokenizer, other things.

The issue is that the copyright protected training data cannot
be shared with others. Over at soureforge we build the models
outside and then uploaded them to the website.
We had a bad expierence with checking them into cvs.

Maybe you should fetch the models via http from the website,
for now you could just use the models which are on our
sourceforge page, later when we put the models
on the Apache website you could simply update your links.

What do you think ?

Jörn

Re: Distributing our statistical models

Reply via email to