On 2/1/11 4:57 PM, Grant Ingersoll wrote:
Your timing is great, as I was just about to suggest the same thing.

On Feb 1, 2011, at 6:51 AM, Jörn Kottmann wrote:

Hi all,

I would like to go ahead and get our first release out. The release is
backward compatible with the models we had over at SourceForge.
Which means we do not need to release new models right now.

The logic to train most of the models is already included in OpenNLP
and enables our users just to train the models them self or even
mix with their own data.

To release the models at Apache we have to go trough a series of legal
issues which I believe should not postpone our first release for
weeks or months.
Can you summarize here the issues?  The last thread is mountainous.  To some 
extent, there is no time like the present to address the legal issues.  The ASF 
has legal counsel, if you can summarize what we do to make the models and what 
the concerns are, we can take it over to legal-discuss@ and start working on 
it.  It may not be as big a deal as one might think.

The concerns are, that our models are trained on various closed or free corpora which almost all have different licenses. We would have to discuss if the trained model from each corpora is allowed to be distributed under AL 2.0.

I believe in most cases we do not validate any copyright, because statistics about text is not protected by its copyright. We would for example generate bigram or trigram features over the whole corpus.

In my opinion we at least need to provide a list of corpora and licenses to start a discussion over at the legal list, which alone will take some time to dig out, at least for the english training data.
We also have training data where we are unsure about the license.

On the other side we have no advantage of doing it now as part of the release.

In my opinion we should try to get the process started, maybe put together a wiki page and as soon as we have all the information we need for the legal people we start talking to them.

Jörn

Reply via email to