http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201011.mbox/%[email protected]%3E

On Tue, Feb 1, 2011 at 4:45 PM, Grant Ingersoll <[email protected]> wrote:
>
> On Feb 1, 2011, at 11:20 AM, Benson Margulies wrote:
>
>> With somewhat mixed feelings, I've been following this discussion. In
>> the interests of full disclosure, I'll explain the mixed feelings in a
>> moment. I warmed up legal-discuss for you during the incubator
>> discussion and learned some things.
>
> What's the thread for this one?
>
>>
>> Based on my legal understanding, I feel fairly confident that models
>> derived from textual corpora are not 'derived works' subject to the
>> copyrights and licenses of the corpora. However, IANAL, and this needs
>> to be explored. Some remarks on legal-discuss suggest that, in Europe,
>> I may be completely wrong. Still, this is probably the *good* news.
>>
>> The less-good news is that, as a general principle, the ASF would not
>> want a release to contain a binary artifact derived from sources hat
>> cannot be released under the Apache license, or even obtained under
>> the Apache license or something remotely like it. An even stronger
>> principle is that the source materials must be available, period
>> (e.g. not available only to LDC members or something).
>
> This is the single most frustrating issue facing open source text tools to 
> date.  It's why I started the Open Relevance Project, but until we have 
> enough of us willing to band together and work on it, we will be stuck.
>
>>
>> The less bad news is that there is a precedent here: SpamAssassin. To
>> train spam models, SpamAssassin has to collect and maintain large
>> collections of materials that have restrictive licenses. The
>> Foundation has decided that this is tolerable if these materials are
>> kept on a Foundation server, and access to that granted to legitimate
>> members of the development community, one by one. This avoids the
>> spectre of 'publication' but permits open participation.
>
> This is OK, but it discourages newbies from participating.
>
>>
>> The bottom line of the legal-discuss discussion was that this path
>> was, broadly, available to OpenNLP. However, legal-discuss hates to
>> discuss hypotheticals, so you won't get a definitive ruling until you
>> ask a specific question. I recommend opening a JIRA on legal-discuss
>> as a way to clarify that you need a clear and definitive ruling and
>> not just an email food-fight.
>
> Yes, we should start assembling a list of corpora, even so we at least have 
> it for others that come later and want to reproduce them.  In the meantime, I 
> would agree that we can just keep the models elsewhere.  We don't have to 
> provide models.  They are a convenience for all involved, but not a 
> requirement in order to run.  I wonder how many people actually train there 
> own.  (BTW, we should update our website to point to older models, too.  They 
> are really hard to find unless you do some URL rewriting.)
>
>
> -Grant

Reply via email to