That sounds good. Also, I'm not sure if this is what you meant, but it would be nice to have a single download per language, in addition to individual models.
What do we do for models trained on corpora that are restricted? Can we distribute some models with, e.g., an academic-use-only license? Jason On Tue, Jan 25, 2011 at 2:51 PM, Jörn Kottmann <[email protected]> wrote: > On 1/19/11 10:34 PM, Jörn Kottmann wrote: > >> Hi all, >> >> as everyone knows OpenNLP needs statistical models. Over at sourceforge >> we simply had a model download page and offered the models there for >> download (we actually still do that). >> >> We might come up with a project internal process to test the quality of >> new >> models before we release them. Beside that are there any rules >> we have to follow ? E.g. a vote on the incubator mailing list, like we >> would do >> for a release of OpenNLP itself. >> > > If we release these models they are just like any other artifacts which are > released, > beside maybe legal issues. But I do not believe that there are any legal > issues, because > we do not violate the copyright of the training material. > > The models will be released under the AL 2.0 and as far as I know must then > also contain the LICENSE and NOTICE files, like we do with the jars. At > least > if we want to keep the format of our current SourceForge download page. > > Otherwise we could create a big model package and place the models together > with the necessary LICENSE/NOTICE files in there. > > If the models are distributed via maven, the LICENSE/NOTICE files can > simply be stored > in the jar files. > > Any opinions ? > > Jörn > > -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com
