Or, maybe just require an open disclosure of where the bot pulled from and how much, instead of having it be a black box? "Text in this response derived from: 17% Wikipedia article 'Example', 12% Wikipedia article 'SomeOtherThing', 10%...".
On Sat, Mar 18, 2023 at 10:17 PM Steven Walling <steven.wall...@gmail.com> wrote: > > > On Sat, Mar 18, 2023 at 3:49 PM Erik Moeller <eloque...@gmail.com> wrote: > >> On Fri, Mar 17, 2023 at 7:05 PM Steven Walling <steven.wall...@gmail.com> >> wrote: >> >> > IANAL of course, but to me this implies that responsibility for the >> *egregious* lack >> > of attribution in models that rely substantially on Wikipedia is >> violating the Attribution >> > requirements of CC licenses. >> >> Morally, I agree that companies like OpenAI would do well to recognize >> and nurture the sources they rely upon in training their models. >> Especially as the web becomes polluted with low quality AI-generated >> content, it would seem in everybody's best interest to sustain the >> communities and services that make and keep high quality information >> available. Not just Wikimedia, but also the Internet Archive, open >> access journals and preprint servers, etc. >> >> Legally, it seems a lot murkier. OpenAI in particular does not >> distribute any of its GPT models. You can feed them prompts by various >> means, and get responses back. Do those responses plagiarize >> Wikipedia? >> >> With image-generating models like Stable Diffusion, it's been found >> that the models sometimes generate output nearly indistinguishable >> from source material [1]. I don't know if similar studies have been >> undertaken for text-generating models yet. You can certainly ask GPT-4 >> to generate something that looks like a Wikipedia article -- here are >> example results for generating a random Wikipedia article: >> >> Article: https://en.wikipedia.org/wiki/The_Talented_Mr._Ripley_(film) >> GPT-4 <https://en.wikipedia.org/wiki/The_Talented_Mr._Ripley_(film)GPT-4> >> run 1: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/1 >> (cut off at the ChatGPT generation limit) >> GPT-4 run 2: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/2 >> GPT-4 <https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/2GPT-4> >> run 3: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/3 >> >> It imitates the form of a Wikipedia article & mixes up / makes up >> assertions, but I don't know that any of its generations would meet >> the standard of infringing on the Wikipedia article's copyright. IANAL >> either, and as you say, the legal landscape is evolving rapidly. >> >> Warmly, >> Erik > > > The whole thing is definitely a hot mess. If the remixing/transformation > by the model is a derivative work, it means OpenAI is potentially violating > the ShareAlike requirement by not distributing the text output as CC. But > on other hand the nature of the model means they’re combining CC and non > free works freely / at random, unless a court would interpret whatever % of > training data comes from us as the direct degree to which the model output > is derived from Wikipedia. Either way it’s going to be up to some legal > representation of copyright holders to test the boundaries here. > > >> [1] >> https://arstechnica.com/information-technology/2023/02/researchers-extract-training-images-from-stable-diffusion-but-its-difficult/ >> _______________________________________________ >> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines >> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and >> https://meta.wikimedia.org/wiki/Wikimedia-l >> Public archives at >> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/CO3IJWXGHTBP3YE7AKUHHKPAL5HA56IC/ >> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org > > _______________________________________________ > Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines > at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and > https://meta.wikimedia.org/wiki/Wikimedia-l > Public archives at > https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/4BZ5B4DFK3HTWM6CHPZ4Q4RDZIGIN26V/ > To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
_______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/4YHAFKDLAPFCNRQGAY77KWRIOIBRWVUH/ To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org