On 7/19/2012 2:07 AM, Lance Norskog wrote: > What is the legitimacy of data which is tagged using an encumbered > model? I mean, if I tag documents with OpenNLP's non-free models on > sourceforge, the tagged output is a "derived work". Is this tagged > output considered free? Does this depend on the license of the > original data? > > Lance,
The problem is two-fold. (1) We would like to distribute the models on Apache. Unfortunately, to do so would mean the models and source used to create the models would have to be under the Apache license to be distributed. We don't see any way around this than to generate our own training data with an open license compatible with the Apache license. Jorn is getting the groundwork done for this with the tagging server to allow us to hand-tag and correct data for our own training data. I know it is re-doing work that already has been done; but, the benefits will be large in the long run. Anyone could download the training data and add/remove/etc all they want to customize the training set to various situations without the worry of a copyright issue. The down side, we have a lot of work to do to get there. (2) The models themselves although available on sourceforge are for research purposes ONLY. The copyright and contract with those holding the copyright for the original works have stated so. I've asked many on this point. We are not helping by breaking the law on this, nor do we suggest anyone to do this. The next problem is we can't distribute the training data for the models.... so, modifications to the models are next to impossible to add training for other situations. The data used to train are mainly from news sources and that limits some of the usefulness for some. ..... I guess I'll have to get the FAQ section on our web-site done soon. Thanks, James
