[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

Erik Moeller Sat, 01 Apr 2023 14:36:23 -0700

Lauren:

> Erik, I see your point now and agree with you. But doesn't it seem
> like obtaining a perfect license is at present the enemy of the urgent
> good of bringing a concerted effort to bear on problems that are
> clearly detrimental to project integrity?


I don't think the licensing question matters for purposes of
evaluation of third party APIs (including providing access to
Wikimedia volunteers to participate in such evaluations), but I would
personally draw the line when it comes to something like a Wikimedia
Cloud Infrastructure installation. Spending a lot of money on compute
infrastructure to run a proprietary model strikes me as clearly out of
scope for the Wikimedia mission.

Openly licensed models for machine translation like Facebook's M2M
(https://huggingface.co/facebook/m2m100_418M) or text generation like
Cerebras-GPT-13B (https://huggingface.co/cerebras/Cerebras-GPT-13B)
and GPT-NeoX-20B (https://huggingface.co/EleutherAI/gpt-neox-20b) seem
like better targets for running on Wikimedia infrastructure, if
there's any merit to be found in running them at this stage.

Note that Facebook's proprietary but widely circulated LLaMA model has
triggered a lot of work on dramatically improving performance of LLMs
through more efficient implementations, to the point that you can run
a decent quality LLM (and combine it with OpenAI's freely licensed
voice detection model) on a consumer grade laptop:

https://github.com/ggerganov/llama.cpp

While I'm not sure if the "hallucination" problem is tractable when
all you have is an LLM, I am confident (based on, e.g., the recent
results with Alpaca: https://crfm.stanford.edu/2023/03/13/alpaca.html)
that the performance of smaller models will continue to increase as we
find better ways to train, steer, align, modularize and extend them.

Chris:

> there is probably an implicit licence granted by whoever publishes
> the work for whoever views it to use it.

Here's a link to the Stable Diffusion (image generation) model weights
from their official repository. Note the lack of any licensing
statement or clickthrough agreement when directly downloading the
weights.

https://huggingface.co/stabilityai/stable-diffusion-2-base/resolve/main/512-base-ema.ckpt

Are you infringing Stability AI's copyright by clicking this link? If
not, are you infringing Stability AI's copyright by then writing a
Python script that uses this file to generate images, if you only run
it locally on your GPU?

Even if a court answers either question with "yes", it still does not
follow that you are bound by any other licensing terms Stability AI is
attaching to those files, a license which you never agreed to when
clicking the link.

But this discussion highlights the fundamental difference between free
licenses like CC-BY-SA/GPL and nonfree "ethical use" licenses like
OpenRail-M. If you want to enforce your ethical use restrictions
without a clickthrough agreement, you have no choice but to adopt an
expansive definition of copyright infringement. This is somewhat
ironic, given that the models themselves are trained on vast amounts
of copyrighted data without permission.

Warmly,
Erik
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/XKC7S7D63YDXZCUJKGRODVRAEGG5BQ7D/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

Reply via email to