Re: Question about licence of opennlp models

Jason Baldridge Wed, 01 Feb 2012 19:12:45 -0800

Hi Katrin,

Yep, the models are subject to the licensing restrictions of the copyright
holders of the corpus used to train them. We are trying to clarify the
model situation wrt models trained on non free corpora over here:


https://github.com/utcompling/OpenNLP-Models

For example, there are now models there for Norwegian, trained from the
NoWaC corpus. The corpus holders have given me license to distribute those
models (but not the corpus).

All the same, we are very keen to move forward with training data that is
based on open data and open annotations. There are some tools that James
mentioned currently underway to do iterative labeling, annotation and model
retraining. With your research background, it would obviously be great to
have your input on this!

Jason


On Wed, Feb 1, 2012 at 7:52 PM, James Kosin <james.ko...@gmail.com> wrote:

> Katrin,
>
> Hmm... maybe I'll be writing a fact page to go on our web-site until we
> get this straightened out.
>
> 1)  The models at sourceforge are primarily used only for research.  No
> commercial usage.
> 2)  Most of the corpus are heavily copyrighted and exclude all
> commercial usage.  Mostly because they are fully copyrighted texts and
> are treated as most books are...
>
> 3)  Both these out of the way, our team is also attempting to put
> together a way for us to generate and get a free corpus based on other
> free sources.  Where the copyright is more of a free information
> exchange.  I think WikiNews has been looked at as well as other
> sources.  We have a sample server applet that will eventually run on a
> server to allow us to mark/tag/take apart the information and generate
> the correct format for the training data required for the namefinder,
> tokenizer and POS tagger.
>      Help on this is extremely welcome and I think you and anyone else
> interested can contact Jorn to get started or how to help.
>
> James
>
> On 2/1/2012 6:42 AM, Katrin Tomanek wrote:
> > Hi everybody,
> >
> > I am wondering what licence the models provided for the apache-opennlp
> > tools (those that can be found at:
> > http://opennlp.sourceforge.net/models-1.5/) are of.
> >
> > As an example: the models based on the tiger corpus -- are they also
> > subject to the apache licence? if not, what licence? Same question for
> > models based on conll data.
> >
> > So, as a company, can we use these models in a commercial context or
> > do we have to licence the original corpus additionally ?
> >
> > Best
> > Katrin
> >
> >
> >
>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: Question about licence of opennlp models

Reply via email to