[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-04-18 Thread Kimmo Virtanen
Some interesting updates on current developments

Open Assistant is open source ChatGPT clone with crowdsourced fine-tuning
- https://open-assistant.io

Redpajama is project for reproducing LLaMA and releasing the model under
open source licence. Current status is that they have released the
pre-training data
- https://www.together.xyz/blog/redpajama

Free-dolly is CC-BY-SA licenced fine-tuning dataset.
-
https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

 LLM chat which runs in web browser (non-open source Vicuna-7B)
- https://simonwillison.net/2023/Apr/16/web-llm/

Afaik all of these are clear steps towards full open source LLM stack.
Especially Open Assistant is especially interesting as it is focusing on
crowdsourcing.

Br,
-- Kimmo Virtanen, Zache

On Mon, Apr 3, 2023 at 8:43 PM Samuel Klein  wrote:

> At this point I guess I would recommend adding five or so
>> g2.cores8.ram36.disk20 flavor VPSs to WMCS, with between one and three
>> RTX A6000 GPUs each, plus a 1TB SSD each, which should cost under
>> $60k. That should allow for very widely multilingual models somewhere
>> between GPT-3.5 and 4 performance with current training rates.
>>
>
> Having part of the cluster for this makes sense, even as what it is used
> for changes over time.
>
>
>> These models can be quantized into int4 weights which run on cell
>> phones:
>> https://github.com/rupeshs/alpaca.cpp/tree/linux-android-build-support
>> It seems inevitable that we will someday include such LLMs with
>> Internet-in-a-Box, and, why not also the primary mobile apps
>>
>
> Eventually, yes. A good reason to renew attention to mobile as a canonical
> wiki experience.
>
>
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/R2KSXD3VE4MPGSIIYCPUMFXFPIY7D5CH/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/VEKGOASMRNUN4ZJSVRMJNBTRTH2VHWCA/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-04-03 Thread Samuel Klein
>
> At this point I guess I would recommend adding five or so
> g2.cores8.ram36.disk20 flavor VPSs to WMCS, with between one and three
> RTX A6000 GPUs each, plus a 1TB SSD each, which should cost under
> $60k. That should allow for very widely multilingual models somewhere
> between GPT-3.5 and 4 performance with current training rates.
>

Having part of the cluster for this makes sense, even as what it is used
for changes over time.


> These models can be quantized into int4 weights which run on cell
> phones:
> https://github.com/rupeshs/alpaca.cpp/tree/linux-android-build-support
> It seems inevitable that we will someday include such LLMs with
> Internet-in-a-Box, and, why not also the primary mobile apps
>

Eventually, yes. A good reason to renew attention to mobile as a canonical
wiki experience.
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/R2KSXD3VE4MPGSIIYCPUMFXFPIY7D5CH/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-04-02 Thread Lauren Worden
On Sun, Apr 2, 2023 at 5:40 PM Erik Moeller  wrote:
>
> I can't comment on the hardware requirements, but I would note that in
> addition to the llama.cpp repository
> (https://github.com/ggerganov/llama.cpp), which currently focuses on
> LLaMA/Alpaca, there are other efforts to reduce the computational
> requirements for running LLMs. https://github.com/NolanoOrg/cformers
> looks promising and supports many of the open models. Fabrice Bellard
> of FFmpeg fame was one of the first implementers of a highly optimized
> LLM at https://textsynth.com/ ; sadly much of the work is proprietary

At this point I guess I would recommend adding five or so
g2.cores8.ram36.disk20 flavor VPSs to WMCS, with between one and three
RTX A6000 GPUs each, plus a 1TB SSD each, which should cost under
$60k. That should allow for very widely multilingual models somewhere
between GPT-3.5 and 4 performance with current training rates.

> https://textsynth.com/playground.html remains one of the most
> accessible ways to explore the performance of the open models with
> only a rate limitation, and no requirement to purchase credits.

There is are free Alpaca-30b demos for comparison at
https://github.com/deep-diver/Alpaca-LoRA-Serve
And free Alpaca-7b online at https://chatllama.baseten.co/

These models can be quantized into int4 weights which run on cell
phones: https://github.com/rupeshs/alpaca.cpp/tree/linux-android-build-support
It seems inevitable that we will someday include such LLMs with
Internet-in-a-Box, and, why not also the primary mobile apps so we
don't have to give away CPU utilization?

There is a proposal to allow apps over 4GB in WASM:
https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md
At the rate things are improving maybe that won't even be neeedd to
make a reasonable static web app, someday.

-LW
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/V2NR6LH22JWDXDQL2KSAOEBJKID7LJSZ/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-04-02 Thread Chris Keating
>
>
>
> Are you infringing Stability AI's copyright by clicking this link? If
> not, are you infringing Stability AI's copyright by then writing a
> Python script that uses this file to generate images, if you only run
> it locally on your GPU?
>
> Even if a court answers either question with "yes", it still does not
> follow that you are bound by any other licensing terms Stability AI is
> attaching to those files, a license which you never agreed to when
> clicking the link.
>

I don't entirely follow, I'm afraid.

Say I take a document from the Internet and then use it to do something
(anything).

Copyright inherently exists in the document.

I am very likely to have done something which is 'copying' in legal terms,
either in the act of downloading it, or the act of using it, or both.

What is my legal basis for doing this? Given that there is copyright in the
work and I have copied it, I must have a legal basis for doing this, I
can't just wave my hands and say it's fine.
In common law legal systems the answer is likely to be one (or more) of;
- there is a statutory basis for my doing so (perhaps the one I mentioned
earlier in the thread or a 'fair dealing' exemption)
- there is an explicit licence attached to the document defined somewhere
by the creator (either a general one or something created by a contract
between myself and the creator)
- there is an implicit licence attached to the document defined by the
creator's actions in the light of their reasonable expectations of others'
actions, e.g. if you publish a document on the internet you very probably
imply a licence to perform such copying as is actually needed to read the
document

If I have clicked a button to indicate acceptance of some terms and
conditions, then very likely those terms contains an explicit licence I can
rely on.

However, not clicking a button to indicate acceptance of terms and
conditions does not mean I can do whatever I want. It means that I either
have to find other evidence of an explicit licence (maybe text on the
document?), or consider whether there is an implicit licence or exemption.
An implicit licence might well exist but is quite likely to be minimal in
scope.

If I then re-copy the material and then republish it, that would be a
further act of copying, and I would have to answer the same questions. The
fact of republication does not fundamentally change anything, though it may
be handled differently in the different exemptions or licences I am
probably relying on.  An implicit licence is less likely to exist the
further I manipulate something, as that is probably further removed from
the copyright holder's original expectations.

Chris
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/QYHJ2NJYT6Q4L3PPBPQ66ELWFORMZ6XI/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-04-02 Thread Lauren Worden
On Sat, Apr 1, 2023 at 10:18 PM rupert THURNER  wrote:
> On Sat, Apr 1, 2023 at 11:36 PM Erik Moeller  wrote:
> >
> > ... I am confident (based on, e.g., the recent
> > results with Alpaca: https://crfm.stanford.edu/2023/03/13/alpaca.html)
> > that the performance of smaller models will continue to increase as we
> > find better ways to train, steer, align, modularize and extend them.
>
> to host open models like above would be really
> cool for multiple reasons, the most important one to bring
> back the openess into the training

Wow! While Alpaca is English only and released under CC-NC-BY, it does
seem like it's very easily replicated with a wide context window and
could probably be made widely multilingual beyond the performance of
GPT-3.5 for less than it would cost to merely host BLOOM for a few
months. This shocked me and of course I take back what I said about
requiring several million dollars.

https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html
https://huggingface.co/databricks/dolly-v1-6b
https://github.com/tatsu-lab/stanford_alpaca

What kind of hardware should WMCS buy to support such a project?

On Sat, Apr 1, 2023 at 2:36 PM Erik Moeller  wrote:
>
> ... I'm not sure if the "hallucination" problem is tractable when
> all you have is an LLM

I disagree, which is why I have been pushing RARR and ROME. RARR seeks
to use the same principles of WP:V to eliminate hallucination,
requiring confirmation from verifiable sources, which can be limited
to e.g. those approved by WP:RSP, and cited in a way that readers can
independently verify. I've been posting links to the RARR paper which
doesn't go very deep on some of those points, but here's an hour-long
presentation by one of the authors which is a lot meatier on such
topics:
https://www.youtube.com/watch?v=d45Ms8LmF5k
And here's a Twitter thread which is more accessible to those less
familiar with similar literature:
https://twitter.com/kelvin_guu/status/1582714222080688133

Once an attribution and verification system like RARR has identified
inaccuracies and hallucinations, the ROME/MEMIT method of editing the
models directly can eliminate them completely, and in a way that also
eliminates similar generalized mistakes; please see: "Rank-One Editing
of Encoder-Decoder Models" https://arxiv.org/abs/2211.13317

I can't believe that the large AI labs aren't working harder on these
efforts than they've been letting on. Either they aren't or they are
in an uncharacteristically secretive fashion, which would suggest they
want to exploit such advances as proprietary trade secrets. In either
case, it's vital that fully open organizations like the Foundation get
involved quickly. There is reason to believe the latter case, because
Google Bard uses a much less rigorous form of attribution and
verification (probably based on SPARROW,
https://arxiv.org/abs/2209.14375) but it actually causes its
hallucinations to get worse e.g. in
https://i.redd.it/f30u9n0gn9pa1.png
If you watch the RARR video towards the end, Dr. Lao indicates they
encountered similar issues but were able to eliminate almost all of
them.

-LW
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/2RZWYFSCPY4KGOSX22KPCKJFL6V36U56/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-04-01 Thread rupert THURNER
On Sat, Apr 1, 2023 at 11:36 PM Erik Moeller  wrote:
> Openly licensed models for machine translation like Facebook's M2M
> (https://huggingface.co/facebook/m2m100_418M) or text generation like
> Cerebras-GPT-13B (https://huggingface.co/cerebras/Cerebras-GPT-13B)
> and GPT-NeoX-20B (https://huggingface.co/EleutherAI/gpt-neox-20b) seem
> like better targets for running on Wikimedia infrastructure, if
> there's any merit to be found in running them at this stage.
>
> Note that Facebook's proprietary but widely circulated LLaMA model has
> triggered a lot of work on dramatically improving performance of LLMs
> through more efficient implementations, to the point that you can run
> a decent quality LLM (and combine it with OpenAI's freely licensed
> voice detection model) on a consumer grade laptop:
>
> https://github.com/ggerganov/llama.cpp
>
> While I'm not sure if the "hallucination" problem is tractable when
> all you have is an LLM, I am confident (based on, e.g., the recent
> results with Alpaca: https://crfm.stanford.edu/2023/03/13/alpaca.html)
> that the performance of smaller models will continue to increase as we
> find better ways to train, steer, align, modularize and extend them.

to host open models like above would be really
cool for multiple reasons, the most important one to bring
back the openess into the training, besides the many
voices out of the movement considering various social
aspects one would never have the idea of otherwise.

rupert
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/55MB54MTGLIIUPRGKJI2UUPHYFXV6AHT/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-04-01 Thread Erik Moeller
Lauren:

> Erik, I see your point now and agree with you. But doesn't it seem
> like obtaining a perfect license is at present the enemy of the urgent
> good of bringing a concerted effort to bear on problems that are
> clearly detrimental to project integrity?

I don't think the licensing question matters for purposes of
evaluation of third party APIs (including providing access to
Wikimedia volunteers to participate in such evaluations), but I would
personally draw the line when it comes to something like a Wikimedia
Cloud Infrastructure installation. Spending a lot of money on compute
infrastructure to run a proprietary model strikes me as clearly out of
scope for the Wikimedia mission.

Openly licensed models for machine translation like Facebook's M2M
(https://huggingface.co/facebook/m2m100_418M) or text generation like
Cerebras-GPT-13B (https://huggingface.co/cerebras/Cerebras-GPT-13B)
and GPT-NeoX-20B (https://huggingface.co/EleutherAI/gpt-neox-20b) seem
like better targets for running on Wikimedia infrastructure, if
there's any merit to be found in running them at this stage.

Note that Facebook's proprietary but widely circulated LLaMA model has
triggered a lot of work on dramatically improving performance of LLMs
through more efficient implementations, to the point that you can run
a decent quality LLM (and combine it with OpenAI's freely licensed
voice detection model) on a consumer grade laptop:

https://github.com/ggerganov/llama.cpp

While I'm not sure if the "hallucination" problem is tractable when
all you have is an LLM, I am confident (based on, e.g., the recent
results with Alpaca: https://crfm.stanford.edu/2023/03/13/alpaca.html)
that the performance of smaller models will continue to increase as we
find better ways to train, steer, align, modularize and extend them.

Chris:

> there is probably an implicit licence granted by whoever publishes
> the work for whoever views it to use it.

Here's a link to the Stable Diffusion (image generation) model weights
from their official repository. Note the lack of any licensing
statement or clickthrough agreement when directly downloading the
weights.

https://huggingface.co/stabilityai/stable-diffusion-2-base/resolve/main/512-base-ema.ckpt

Are you infringing Stability AI's copyright by clicking this link? If
not, are you infringing Stability AI's copyright by then writing a
Python script that uses this file to generate images, if you only run
it locally on your GPU?

Even if a court answers either question with "yes", it still does not
follow that you are bound by any other licensing terms Stability AI is
attaching to those files, a license which you never agreed to when
clicking the link.

But this discussion highlights the fundamental difference between free
licenses like CC-BY-SA/GPL and nonfree "ethical use" licenses like
OpenRail-M. If you want to enforce your ethical use restrictions
without a clickthrough agreement, you have no choice but to adopt an
expansive definition of copyright infringement. This is somewhat
ironic, given that the models themselves are trained on vast amounts
of copyrighted data without permission.

Warmly,
Erik
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/XKC7S7D63YDXZCUJKGRODVRAEGG5BQ7D/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org


[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-04-01 Thread Chris Keating
On Fri, Mar 31, 2023 at 3:05 PM Erik Moeller  wrote:

> On Thu, Mar 30, 2023 at 12:25 PM Lauren Worden 
> wrote:
> > > If you don't obtain this agreement, you cannot meaningfully enforce
> > > the "license" because the downloader never agreed to it in the first
> > > place. Moreover, you'll have to make sure that _everyone else making
> > > copies of the file_ also obtains agreement from people getting those
> > > copies, or your whole house of cards falls down.
>
> > Isn't that exactly how we impose attribution and share-alike
> > requirements of CC-BY-SA content?
>
> Not exactly. CC-BY-SA gives Wikimedia readers permissions they would
> not otherwise have (e.g., to distribute copies), and it ties those
> permissions to certain obligations (e.g., attribution). Readers who do
> not wish to exercise those additional permissions are not required to
> adhere to the obligations. They'd just be limited to what copyright
> law lets you do with content you download from a public website.
> Nobody can stop you from making your own offline version of Wikipedia,
> calling it "Bobbypedia", and removing all other attribution -- as long
> as you keep it to yourself.
>
> To be sure, you can put restrictions in an AI model license that kick
> in for folks distributing the model, which is something they wouldn't
> legally be able to do without consulting and agreeing to the licensing
> terms. But, crucially, you don't have to distribute an AI model to run
> it. Most of the unethical uses folks tend to worry about (e.g., bulk
> generation of misinformation) do not involve distributing copies of
> the model, only of its output.
>

This is perhaps a bit academic, but this is not really the case, at least
in UK copyright law.

The 'copying' inherent in viewing a web page is permissible under two
grounds:
1) there is a statutory exemption in copyright law for this specific
activity (in section 28a of the Copyright, Designs and Patents Act 1988, if
anyone cares to look it up ;) ). This would likely not apply to details of
AI models as the exemption excludes 'a computer program or a database'.
Whether it would apply to Bobbypedia depends on whether it counts as a
database (strikes me as arguable).
2) there is probably an implicit licence granted by whoever publishes the
work for whoever views it to use it. The scope of this implicit licence is
highly debatable and probably extremely limited. Do you have an implied
licence to download the HTML of a webpage into your browser cache and use
your web browser to render the page and display the resulting content? Very
likely. Do you have an implied licence to save a PDF copy onto your hard
drive? Maybe. Do you have an implied licence to use the page to create a
personal AI model and distribute the output? That is very unclear, probably
not. Perhaps less likely if there was also an explicit license attached to
the page.

Chris
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/7J3COWU55XXHSZIOTWWKJTZDRIRQR3TI/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-04-01 Thread Jan Ainali
Den lör 1 apr. 2023 kl 10:21 skrev Lauren Worden :

> How would you characterize the harm of hosting BLOOM until a
> comparable FOSS model is available?


There are a few risks that could be harmful, although I don't think they
are neither certain nor very direct.
But, if we do give up our principle of only using and allowing open source
(that we held for over 20 years) here are a few of these risks.
The first I come to think of is that we alienate the volunteers that hold
these ideals high (remember the mp4 vote on Commons in 2014).
Second, we dilute the concept of "open" by implying that these usage
restricting licenses are just as good as FOSS licenses.
Third, we are not being a good role model for the rest of the open movement.

With these risks in mind, I would much rather we wait for a FOSS model, not
rushing on a hype train, even if it is not as powerful.

/Jan
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/LNLYZSARUSTNQJOPTTRRWXUAKPJ5EOGV/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-04-01 Thread Lauren Worden
Erik, I see your point now and agree with you. But doesn't it seem
like obtaining a perfect license is at present the enemy of the urgent
good of bringing a concerted effort to bear on problems that are
clearly detrimental to project integrity?

I haven't been able to tell whether any of the people training truly
FOSS LLMs are even working on models the size (in parameters or of the
context window) of GPT-3. The cost of training such models is falling
rapidly with various advances, but it might never fall below the
several million dollar range.

How would you characterize the harm of hosting BLOOM until a
comparable FOSS model is available? Alternatively, is there a
partnership solution to this problem within the Foundation's budget
constraints?

-LW

On Fri, Mar 31, 2023 at 10:51 AM Erik Moeller  wrote:
>
> On Fri, Mar 31, 2023 at 8:38 AM  wrote:
>
> > Downloading computer programs and electronic databases (and downloading for 
> > purposes outside
> > the listed exception) requires an express consent of the copyright holder, 
> > i.e. a license.
> > In other words, you _cannot_ download a GPL program without agreeing to the 
> > GPL
>
> The act of downloading a copyrighted work is, of course, covered by
> copyright. But it does not follow that by downloading a work, you
> agree to whatever terms the person offering it imagines you agreed to.
>
> If you want them to agree to those terms, you have to obtain that
> agreement. Otherwise, if you publish your work freely (i.e. with
> obvious intent to publish, not in some hidden directory on your
> webserver), the permission to download the work is implied by you
> publishing it. Or to put it another way, you can't publish and
> advertise a website and then make a credible demand for 500 dollars
> from anyone who clicks the link. Want 500 dollars? Ask for it on a
> clickthrough form that makes it obvious what the buyer pays for. Want
> people to agree to your ethical AI use restrictions? Ask for it before
> you give them your model weights.
>
> Website terms of use are a gray area, but their enforceability is
> limited (beyond defending your right to refuse service by blocking a
> person from visiting your site) if you've not made their acceptance
> sufficiently explicit.
>
> IANAL, so ask a lawyer if you don't believe me. :)
>
> Warmly,
> Erik
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at 
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/T4DDD65ZJIK2JDBQCV4HE3KGHTJSUMGI/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/OL2FX2CLS5U4YLC2NMCYPX4E2FI3FYIF/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-31 Thread Erik Moeller
On Fri, Mar 31, 2023 at 8:38 AM  wrote:

> Downloading computer programs and electronic databases (and downloading for 
> purposes outside
> the listed exception) requires an express consent of the copyright holder, 
> i.e. a license.
> In other words, you _cannot_ download a GPL program without agreeing to the 
> GPL

The act of downloading a copyrighted work is, of course, covered by
copyright. But it does not follow that by downloading a work, you
agree to whatever terms the person offering it imagines you agreed to.

If you want them to agree to those terms, you have to obtain that
agreement. Otherwise, if you publish your work freely (i.e. with
obvious intent to publish, not in some hidden directory on your
webserver), the permission to download the work is implied by you
publishing it. Or to put it another way, you can't publish and
advertise a website and then make a credible demand for 500 dollars
from anyone who clicks the link. Want 500 dollars? Ask for it on a
clickthrough form that makes it obvious what the buyer pays for. Want
people to agree to your ethical AI use restrictions? Ask for it before
you give them your model weights.

Website terms of use are a gray area, but their enforceability is
limited (beyond defending your right to refuse service by blocking a
person from visiting your site) if you've not made their acceptance
sufficiently explicit.

IANAL, so ask a lawyer if you don't believe me. :)

Warmly,
Erik
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/T4DDD65ZJIK2JDBQCV4HE3KGHTJSUMGI/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-31 Thread petr . kadlec
Hi,

On Thu, Mar 30, 2023 at 1:28 PM Erik Moeller  wrote:

> One core principle in open source licenses is that you are not
> required to agree to the license in order to download or run copies.
> The GPL makes this explicit: "You are not required to accept this
> License in order to receive or run a copy of the Program." This is
> really important. I can download and run every bit of open source
> software in existence without ever agreeing to a single license.
>
> Downloading a thing you make available doesn't give me the right to
> distribute it -- copyright law itself is sufficient to limit that. If
> you want to impose _additional restrictions_ on a person for stuff
> they download from you, that actually requires proactive agreement
> from the user to those restrictions at the time they download the
> thing.
>

I’m not saying this is wrong in all jurisdictions, but it is definitely not
correct in at least some of them…

Specifically, per the Czech copyright law, an act of downloading some
copyrighted work is restricted by copyright, as it is (obviously?) copying
(“reproduction”) of the work, which is (obviously?) covered by copyright.

There is an exception by which you are specifically allowed to copy some
copyrighted works “for personal needs by a natural person without seeking
to achieve direct or indirect economic benefit” but this exception does not
apply to computer programs and electronic databases. Downloading computer
programs and electronic databases (and downloading for purposes outside the
listed exception) requires an express consent of the copyright holder, i.e.
a license. In other words, you _cannot_ download a GPL program without
agreeing to the GPL (which, as you wrote, allows that to anyone without
further conditions, so that’s not a problem as far as downloading and
running the program goes).

-- [[cs:User:Mormegil | Petr Kadlec]]
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/RRALSGVTM6IGKJPAMDFFT7MSJ7ECSHBS/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-31 Thread Erik Moeller
On Thu, Mar 30, 2023 at 12:25 PM Lauren Worden  wrote:
> > If you don't obtain this agreement, you cannot meaningfully enforce
> > the "license" because the downloader never agreed to it in the first
> > place. Moreover, you'll have to make sure that _everyone else making
> > copies of the file_ also obtains agreement from people getting those
> > copies, or your whole house of cards falls down.

> Isn't that exactly how we impose attribution and share-alike
> requirements of CC-BY-SA content?

Not exactly. CC-BY-SA gives Wikimedia readers permissions they would
not otherwise have (e.g., to distribute copies), and it ties those
permissions to certain obligations (e.g., attribution). Readers who do
not wish to exercise those additional permissions are not required to
adhere to the obligations. They'd just be limited to what copyright
law lets you do with content you download from a public website.
Nobody can stop you from making your own offline version of Wikipedia,
calling it "Bobbypedia", and removing all other attribution -- as long
as you keep it to yourself.

To be sure, you can put restrictions in an AI model license that kick
in for folks distributing the model, which is something they wouldn't
legally be able to do without consulting and agreeing to the licensing
terms. But, crucially, you don't have to distribute an AI model to run
it. Most of the unethical uses folks tend to worry about (e.g., bulk
generation of misinformation) do not involve distributing copies of
the model, only of its output.

If you want to impose ethical use restrictions on people running your
AI models, you have two choices: You can require everyone getting a
copy of the model by any means to explicitly agree to those
restrictions (presumably Facebook does this when distributing LLaMA to
researchers), or you can make your model freely available and protest
ineffectually when a downloader ignores the restrictions you've
spelled out in a textfile in your repository. Neither approach is
compatible with open source.

> I have no particular affinity to BLOOM, but I have been able to
> personally test that it is capable of at least a dozen different use
> cases that people have shown GPT-3 and ChatGPT can be used for on
> enwiki.

I think it's fine to explore all sorts of models, free and nonfree,
for the purpose of assessing capabilities and mitigating risks. When
it comes to deployment of models in a production context, IMO
Wikimedia should exclude from consideration any models under
ill-conceived "ethical use" licenses.

Warmly,
Erik
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/4XZMYBMH7XESK23KWPFTBXKM7R2H4DJR/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-30 Thread Lauren Worden
On Thu, Mar 30, 2023 at 4:28 AM Erik Moeller  wrote:
> If you want to impose _additional restrictions_ on a person for stuff
> they download from you, that actually requires proactive agreement
> from the user to those restrictions at the time they download the
> thing.
>
> If you don't obtain this agreement, you cannot meaningfully enforce
> the "license" because the downloader never agreed to it in the first
> place. Moreover, you'll have to make sure that _everyone else making
> copies of the file_ also obtains agreement from people getting those
> copies, or your whole house of cards falls down.

Isn't that exactly how we impose attribution and share-alike
requirements of CC-BY-SA content?

On Thu, Mar 30, 2023 at 4:25 AM Kimmo Virtanen
 wrote:
>> To generate or disseminate information or content, in any context (e.g. 
>> posts, articles, tweets, chatbots or other kinds of automated bots) without 
>> expressly
>> and intelligibly disclaiming that the text is machine generated
>
> This makes it useless in most content-related use cases as it requires too 
> much extra text to use the results.

I guess that the General Disclaimer could serve to fulfill that requirement.

> About FOSS compatible LLMs, EleutherAI's GPT-J, NeoX, and Pythia
> and Cerebras-GPT are under Apache 2.0. The question is whether these
> models are good enough to be useful. However, the same question is
> relevant to Bloom too.

I have no particular affinity to BLOOM, but I have been able to
personally test that it is capable of at least a dozen different use
cases that people have shown GPT-3 and ChatGPT can be used for on
enwiki. My promotion of leveraging it is for the strictly utilitarian
purpose of providing an infrastructure to work on the problems which
seem to have the greatest risk to project content if not addressed.

I would prefer a more widely multilingual model trained on all of the
Foundation content suitable for that purpose, but training such models
is a much more expensive proposition than merely using them.

-LW
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/OKS6NX6BJGIYYERRDGOEQ7YGKTUN5Y3B/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-30 Thread Jan Ainali
Den tors 30 mars 2023 kl 02:33 skrev Lauren Worden :

>
> Is the BLOOM RAIL license [
> https://huggingface.co/spaces/bigscience/license ] proprietary?
>

Yes. The common definition is that if it is not open source, it is
proprietary. But you don't need to take my word for it.


> So I expect the BLOOM license would therefor qualify for an exception
> as described in
> https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use


Point 3 in the "What uses of Cloud Services do we not like?":

*"Proprietary software*: Do not use or install any software unless the
software is licensed under an Open Source license
."

The Wikimedia Cloud terms of use even narrows it down to only Open Source
Initiative approved licenses. So if not even CC0 licenses are allowed on
Wikimedia Cloud (that license is only approved by the FSF, not by the OSI),
for sure, the RAIL license is not allowed.

/Jan
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/GOUMJH7UUGSAOC4OMC3VVBGUL5AWNZBR/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-30 Thread Erik Moeller
On Wed, Mar 29, 2023 at 1:49 PM Jan Ainali  wrote:

> On the contrary, I think it is important to, as early as possible, deter all 
> these attempts
> to weaken the concept of "open" and that we as a movement need to take a hard 
> stance
> against them.

I agree with Jan on this. Licenses are the wrong tool for the job for
which they're being used for here (regulating use of AI models).

One core principle in open source licenses is that you are not
required to agree to the license in order to download or run copies.
The GPL makes this explicit: "You are not required to accept this
License in order to receive or run a copy of the Program." This is
really important. I can download and run every bit of open source
software in existence without ever agreeing to a single license.

Downloading a thing you make available doesn't give me the right to
distribute it -- copyright law itself is sufficient to limit that. If
you want to impose _additional restrictions_ on a person for stuff
they download from you, that actually requires proactive agreement
from the user to those restrictions at the time they download the
thing.

If you don't obtain this agreement, you cannot meaningfully enforce
the "license" because the downloader never agreed to it in the first
place. Moreover, you'll have to make sure that _everyone else making
copies of the file_ also obtains agreement from people getting those
copies, or your whole house of cards falls down. Needless to say, this
is totally incompatible with the way we distribute open source
software.

To pick a concrete example, you can download the Stable Diffusion Weights here:
https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt

Did you agree to the Open Rail-M license? Nope, but you visited a
public URL to download some model weights you can do stuff with. I
cannot see any reasonable argument that you would be subject to the
provision of the license when _running_ the model locally or on your
own infrastructure.

To illustrate the point further, let's say I make "CoolCalculator.exe"
available to you, you download and run it, and then I demand 500
dollars from you. Why 500 dollars? Well, my license requires that if
you add sums greater than 1000 with my calculator, you owe me money.
You didn't agree to the license? Tough! Shouldn't have downloaded my
calculator!

In short, in my view, these attempts to embed ethical rulesets into
licensing agreements are a "We did a thing" approach to ethics. They
are of highly dubious enforceability and do nothing to deter bad
actors, while making the technology legally incompatible with open
source software.

Warmly,
Erik
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/EHUUXATAVUDWMN6V75DFHSX2QR4WJC4W/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-30 Thread Kimmo Virtanen
Hi,

>>  My understanding is that is not proprietary, and the only reason it
doesn't qualify for Open Source Initiative approval is because of these use
restrictions:
>
> To generate or disseminate information or content, in any context (e.g.
posts, articles, tweets, chatbots or other kinds of automated bots) without
expressly
> and intelligibly disclaiming that the text is machine generated
>
This makes it useless in most content-related use cases as it requires too
much extra text to use the results.

About FOSS compatible LLMs, EleutherAI's GPT-J, NeoX, and Pythia and
Cerebras-GPT are under Apache 2.0. The question is whether these models are
good enough to be useful. However, the same question is relevant to Bloom
too.

Br,
-- Kimmo Virtanen, Zache

On Thu, Mar 30, 2023 at 3:34 AM Lauren Worden 
wrote:

> On Wed, Mar 29, 2023 at 1:50 PM Jan Ainali  wrote:
> >
> > I think it is important to, as early as possible, deter all these
> attempts to weaken the concept of "open" and that we as a movement need to
> take a hard stance against them.
> > These proprietary licenses do not fit the spirit of sharing all
> knowledge and letting anyone do whatever they want with it.
>
> Is the BLOOM RAIL license [
> https://huggingface.co/spaces/bigscience/license ] proprietary?  My
> understanding is that is not proprietary, and the only reason it
> doesn't qualify for Open Source Initiative approval is because of
> these use restrictions:
>
> "You agree not to use the Model or Derivatives of the Model:
> (a) In any way that violates any applicable national, federal, state,
> local or international law or regulation;
> (b) For the purpose of exploiting, harming or attempting to exploit or
> harm minors in any way;
> (c) To generate or disseminate verifiably false information with the
> purpose of harming others;
> (d) To generate or disseminate personal identifiable information that
> can be used to harm an individual;
> (e) To generate or disseminate information or content, in any context
> (e.g. posts, articles, tweets, chatbots or other kinds of automated
> bots) without expressly and intelligibly disclaiming that the text is
> machine generated;
> (f) To defame, disparage or otherwise harass others;
> (g) To impersonate or attempt to impersonate others;
> (h) For fully automated decision making that adversely impacts an
> individual’s legal rights or otherwise creates or modifies a binding,
> enforceable obligation;
> (i) For any use intended to or which has the effect of discriminating
> against or harming individuals or groups based on online or offline
> social behavior or known or predicted personal or personality
> characteristics
> (j) To exploit any of the vulnerabilities of a specific group of
> persons based on their age, social, physical or mental
> characteristics, in order to materially distort the behavior of a
> person pertaining to that group in a manner that causes or is likely
> to cause that person or another person physical or psychological harm;
> (k) For any use intended to or which has the effect of discriminating
> against individuals or groups based on legally protected
> characteristics or categories;
> (l) To provide medical advice and medical results interpretation;
> (m) To generate or disseminate information for the purpose to be used
> for administration of justice, law enforcement, immigration or asylum
> processes, such as predicting an individual will commit fraud/crime
> commitment (e.g. by text profiling, drawing causal relationships
> between assertions made in documents, indiscriminate and
> arbitrarily-targeted use)."
>
> Those restrictions seem very reasonable to me, and I would consider
> them an advantage given the problems the field is experiencing,
> including the threats to project content integrity. I don't see any
> drawbacks, and I see several advantages to encouraging such
> restrictions.
>
> So I expect the BLOOM license would therefor qualify for an exception
> as described in
> https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use
>
> There is further discussion of these issues at
> https://arxiv.org/pdf/2011.03116.pdf
>
> -LW
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/L6DTD5QQWJPZVXDMT4L5NVFWCZKPLXJD/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/D3OT2HV2ZFGRH2ONOD7JVJ4R25MICEL2/
To unsubscribe 

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-29 Thread Lauren Worden
On Wed, Mar 29, 2023 at 1:50 PM Jan Ainali  wrote:
>
> I think it is important to, as early as possible, deter all these attempts to 
> weaken the concept of "open" and that we as a movement need to take a hard 
> stance against them.
> These proprietary licenses do not fit the spirit of sharing all knowledge and 
> letting anyone do whatever they want with it.

Is the BLOOM RAIL license [
https://huggingface.co/spaces/bigscience/license ] proprietary?  My
understanding is that is not proprietary, and the only reason it
doesn't qualify for Open Source Initiative approval is because of
these use restrictions:

"You agree not to use the Model or Derivatives of the Model:
(a) In any way that violates any applicable national, federal, state,
local or international law or regulation;
(b) For the purpose of exploiting, harming or attempting to exploit or
harm minors in any way;
(c) To generate or disseminate verifiably false information with the
purpose of harming others;
(d) To generate or disseminate personal identifiable information that
can be used to harm an individual;
(e) To generate or disseminate information or content, in any context
(e.g. posts, articles, tweets, chatbots or other kinds of automated
bots) without expressly and intelligibly disclaiming that the text is
machine generated;
(f) To defame, disparage or otherwise harass others;
(g) To impersonate or attempt to impersonate others;
(h) For fully automated decision making that adversely impacts an
individual’s legal rights or otherwise creates or modifies a binding,
enforceable obligation;
(i) For any use intended to or which has the effect of discriminating
against or harming individuals or groups based on online or offline
social behavior or known or predicted personal or personality
characteristics
(j) To exploit any of the vulnerabilities of a specific group of
persons based on their age, social, physical or mental
characteristics, in order to materially distort the behavior of a
person pertaining to that group in a manner that causes or is likely
to cause that person or another person physical or psychological harm;
(k) For any use intended to or which has the effect of discriminating
against individuals or groups based on legally protected
characteristics or categories;
(l) To provide medical advice and medical results interpretation;
(m) To generate or disseminate information for the purpose to be used
for administration of justice, law enforcement, immigration or asylum
processes, such as predicting an individual will commit fraud/crime
commitment (e.g. by text profiling, drawing causal relationships
between assertions made in documents, indiscriminate and
arbitrarily-targeted use)."

Those restrictions seem very reasonable to me, and I would consider
them an advantage given the problems the field is experiencing,
including the threats to project content integrity. I don't see any
drawbacks, and I see several advantages to encouraging such
restrictions.

So I expect the BLOOM license would therefor qualify for an exception
as described in
https://wikitech.wikimedia.org/wiki/Wikitech:Cloud_Services_Terms_of_use

There is further discussion of these issues at
https://arxiv.org/pdf/2011.03116.pdf

-LW
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/L6DTD5QQWJPZVXDMT4L5NVFWCZKPLXJD/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-29 Thread Jan Ainali
Hi Alek, nice to see you here!

On the contrary, I think it is important to, as early as possible, deter
all these attempts to weaken the concept of "open" and that we as a
movement need to take a hard stance against them.
These proprietary licenses do not fit the spirit of sharing all knowledge
and letting anyone do whatever they want with it.
It's not that they are using licenses that currently are not approved by
the leaders of the open source movement, it's that these licenses are
fundamentally, and deliberately, constructed so that they will never be
approved by these bodies.

But yes, we will probably soon see FOSS licensed LLMs, and from these, we
can choose which ones we might want to help develop.
Let's just wait for that day, rather than make a hasty and morally dubious
bet on models available as of today.

Jan Ainali



Den ons 29 mars 2023 kl 21:50 skrev Alek Tarkowski :

> Hi,
>
> (I’m Alek from Open Future Foundation, I largely lurk here, so I want to
> say “Hi everyone!” first).
>
> Jan, you’re right that the RAIL license does not meet any FOSS
> definitions. But its authors, in their white paper, position this license
> not just as “responsible” but also “open”. And project like RAIL or BLOOM,
> connected with the HuggingFace company, aim to define a standard that fits
> the idea of responsible sharing. Looking in more detail, the behavioural
> use limitations in RAIL are ones that could probably be endorsed by
> Wikimedia, based on its Code of Conduct and other community norms.
>
> My point is that it would be good to explore to what extent “openish” AI
> stacks can be a good fit for Wikimedia.
> I follow the conversation around open/responsible AI licensing and
> understand the need to not “dilute" FOSS licensing. But also appreciate
> that AI researchers are actively setting a standard that they think is
> right for AI. I think that their work should not be dismissed just because
> it’s not using one of the canonical open licenses.
>
> BY the way, there will probably be, anytime soon, an LLM that is available
> under a “traditional” FOSS license. But for me that’s even more so a reason
> to consider different options, and be able to make an informed decision.
>
> Best,
> Alek
> --
> Director of Strategy, Open Future | openfuture.eu | +48 889 660 444
> At Open Future, we tackle the Paradox of Open: paradox.openfuture.eu/
>
> On 28 Mar 2023, at 20:30, Jan Ainali  wrote:
>
> Den tis 28 mars 2023 kl 12:08 skrev Lauren Worden <
> laurenworde...@gmail.com>:
>
>> First, the Foundation should host a fork of BLOOM [
>> https://huggingface.co/bigscience/bloom ], which if I remember correctly
>> was described by the Foundation's Machine Learning Director Chris Albon as
>> the only LLM at the scale of GPT-3 adhering to the movement's FOSS
>> criteria.
>>
>
> No, BLOOM is not FOSS by any means.
> It fails freedom 0 of the four freedoms from the Free Software
> Foundation[1], and it is not recognized as an open source license by the
> Open Source Institute (and will not be as it fails requirement 6 of the
> open source definition[2]).
> So that model, any other using the RAIL license, is a dead end.
>
> /Jan
>
> [1] https://www.gnu.org/philosophy/free-sw.html
> [2] https://opensource.org/osd/
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/DW4TRBMJLO4I7MSIUJOHZLH6M2B7CJL5/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>
>
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IWQ4XGPKMBUSZVZ5KOW3ZAT4OWPIUBZR/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BRRPGQXDFYPFBDABSZLBMAX2JDUWKKMW/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-29 Thread Alek Tarkowski
Hi,

(I’m Alek from Open Future Foundation, I largely lurk here, so I want to say 
“Hi everyone!” first).

Jan, you’re right that the RAIL license does not meet any FOSS definitions. But 
its authors, in their white paper, position this license not just as 
“responsible” but also “open”. And project like RAIL or BLOOM, connected with 
the HuggingFace company, aim to define a standard that fits the idea of 
responsible sharing. Looking in more detail, the behavioural use limitations in 
RAIL are ones that could probably be endorsed by Wikimedia, based on its Code 
of Conduct and other community norms.

My point is that it would be good to explore to what extent “openish” AI stacks 
can be a good fit for Wikimedia. 
I follow the conversation around open/responsible AI licensing and understand 
the need to not “dilute" FOSS licensing. But also appreciate that AI 
researchers are actively setting a standard that they think is right for AI. I 
think that their work should not be dismissed just because it’s not using one 
of the canonical open licenses. 

BY the way, there will probably be, anytime soon, an LLM that is available 
under a “traditional” FOSS license. But for me that’s even more so a reason to 
consider different options, and be able to make an informed decision.

Best,
Alek
--
Director of Strategy, Open Future | openfuture.eu | +48 889 660 444
At Open Future, we tackle the Paradox of Open: paradox.openfuture.eu/

> On 28 Mar 2023, at 20:30, Jan Ainali  wrote:
> 
> Den tis 28 mars 2023 kl 12:08 skrev Lauren Worden  >:
>> First, the Foundation should host a fork of BLOOM [ 
>> https://huggingface.co/bigscience/bloom ], which if I remember correctly was 
>> described by the Foundation's Machine Learning Director Chris Albon as the 
>> only LLM at the scale of GPT-3 adhering to the movement's FOSS criteria. 
> 
> No, BLOOM is not FOSS by any means.
> It fails freedom 0 of the four freedoms from the Free Software Foundation[1], 
> and it is not recognized as an open source license by the Open Source 
> Institute (and will not be as it fails requirement 6 of the open source 
> definition[2]).
> So that model, any other using the RAIL license, is a dead end.
> 
> /Jan
> 
> [1] https://www.gnu.org/philosophy/free-sw.html
> [2] https://opensource.org/osd/
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at 
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/DW4TRBMJLO4I7MSIUJOHZLH6M2B7CJL5/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IWQ4XGPKMBUSZVZ5KOW3ZAT4OWPIUBZR/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-28 Thread Jan Ainali
Den tis 28 mars 2023 kl 12:08 skrev Lauren Worden :

> First, the Foundation should host a fork of BLOOM [
> https://huggingface.co/bigscience/bloom ], which if I remember correctly
> was described by the Foundation's Machine Learning Director Chris Albon as
> the only LLM at the scale of GPT-3 adhering to the movement's FOSS
> criteria.
>

No, BLOOM is not FOSS by any means.
It fails freedom 0 of the four freedoms from the Free Software
Foundation[1], and it is not recognized as an open source license by the
Open Source Institute (and will not be as it fails requirement 6 of the
open source definition[2]).
So that model, any other using the RAIL license, is a dead end.

/Jan

[1] https://www.gnu.org/philosophy/free-sw.html
[2] https://opensource.org/osd/
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/DW4TRBMJLO4I7MSIUJOHZLH6M2B7CJL5/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-28 Thread Lauren Worden
Since proposals which don't fit in to existing discussions elsewhere are on
topic here, I want to boldly recommend the following while the annual
planning process is still ongoing, because it's far beyond the scope of
what could be accomplished at a hackathon or on WMCS in a responsible
fashion:

First, the Foundation should host a fork of BLOOM [
https://huggingface.co/bigscience/bloom ], which if I remember correctly
was described by the Foundation's Machine Learning Director Chris Albon as
the only LLM at the scale of GPT-3 adhering to the movement's FOSS
criteria. This should be done under or alongside Toolforge on Wikimedia
Cloud Services so that staff and volunteers alike may use its API and
submit modification proposals for new instances. Presumably this would cost
on the order of $100,000 per year per instance, according to
https://huggingface.co/bigscience/bloom/discussions/161#63a33373b5fc9ab9f63d97f7
but someone should double-check that math. I've tested BLOOM against a
dozen of the uses shown around enwiki for GPT-3 and ChatGPT, and it seems
to perform about as well. (You can use the Hosted Inference API version on
Azure for free at the Huggingface URL.)

Secondly, the Foundation should sponsor staff-, grant-, affiliate-, and
volunteer-run projects to replicate and extend the work on:

A. RARR [ https://arxiv.org/abs/2210.08726 ] and other methods of
attribution and verification with goals aspiring to Wikipedia's standards
of summarizing and citing sources in ways that can be independently
verified.

B. ROME [ https://rome.baulab.info/ / MEMIT: https://memit.baulab.info/ ]
and other approaches to knowledge editing in language models with the goal
of producing simple interfaces to provide "language models that anyone can
edit" and ideally coupled to Wikidata updates.

C. EditEval [ https://eval.ai/web/challenges/challenge-page/1866/overview
], an ongoing challenge competition to produce systems capable of
automatically improving text, including its fluency, simplification,
paraphrasing, neutralization, and updating information.

I apologize to those on Thursday's Zoom call who had proposals for ORES
expansion to combat paid advocacy, images, audio, speech and video, as I
don't remember enough of the details and there's not enough information at
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/Draft/External_Trends/Community_call_notes
to include them here. I hope the advocates will elucidate those proposals
on list while the annual planning process is still in progress.

-LW

On Mon, Mar 27, 2023 at 1:04 PM Yael Weissburg 
wrote:

> Hello again everyone,
>
> Thanks again to those who made it to the call last week - it felt like
> such a luxury to be able to drop deeply into this subject for an hour
> (plus) with all of you.
>
> For those who were unable to join, we captured extensive notes on Meta
> .
> I hope we continue the vibrant discussion we started together on the Talk
> Page. Maybe someone can use that space to volunteer to host the next call?
> I know many folks are eager to continue the live discussion too.
>
> I also wanted to share a few links / resources that might be useful (I'll
> add these to the Talk page as well):
>
>- WMF's Legal team recently did a copyright analysis of ChatGPT. You
>can find that on Meta
>
>.
>- There is a proposed session on ChatGPT / generative AI for the
>Wikimedia Hackathon in May. You can find that on Phabricator
>.
>
> Finally, a huge thank you to @Maryana Pinchuk  who
> took the extensive and detailed notes on the call and also did a lot of
> "wrangling" behind the scenes to help draft the External Trends in the
> first place and get us to a point where we could have this discussion.
> Thank you, Maryana!
>
> Feel free to reach out anytime to connect about this or other topics. I'll
> be in Belgrade for the EduWiki conference in May and Singapore for
> Wikimania - if you're coming to either of those events or in the area, let
> me know - I'd love to meet in person!
>
> Best,
>
> Yael
>
> *Yael Weissburg* (she/her)
> VP, Partnerships, Programs & Grantmaking
> Wikimedia Foundation 
> M: (+1) 415.513.6643
> I work from San Francisco. My time zone is UTC -7/-8.
>
>
>
> On Fri, Mar 24, 2023 at 2:02 AM Paulo Santos Perneta <
> paulospern...@gmail.com> wrote:
>
>> Yes, please, make this a regular event, at least for the time being.
>> These discussions are incredibly useful, given the speed the developments
>> are happening in this area, and the complexity of the challenges we are
>> facing due to them.
>> And thank's a lot for organizing the meeting yesterday!
>>
>> Paulo
>>
>> Samuel Klein  escreveu no dia quinta, 23/03/2023 à(s)
>> 

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-27 Thread Yael Weissburg
Hello again everyone,

Thanks again to those who made it to the call last week - it felt like such
a luxury to be able to drop deeply into this subject for an hour (plus)
with all of you.

For those who were unable to join, we captured extensive notes on Meta
.
I hope we continue the vibrant discussion we started together on the Talk
Page. Maybe someone can use that space to volunteer to host the next call?
I know many folks are eager to continue the live discussion too.

I also wanted to share a few links / resources that might be useful (I'll
add these to the Talk page as well):

   - WMF's Legal team recently did a copyright analysis of ChatGPT. You can
   find that on Meta
   
   .
   - There is a proposed session on ChatGPT / generative AI for the
   Wikimedia Hackathon in May. You can find that on Phabricator
   .

Finally, a huge thank you to @Maryana Pinchuk  who
took the extensive and detailed notes on the call and also did a lot of
"wrangling" behind the scenes to help draft the External Trends in the
first place and get us to a point where we could have this discussion.
Thank you, Maryana!

Feel free to reach out anytime to connect about this or other topics. I'll
be in Belgrade for the EduWiki conference in May and Singapore for
Wikimania - if you're coming to either of those events or in the area, let
me know - I'd love to meet in person!

Best,

Yael

*Yael Weissburg* (she/her)
VP, Partnerships, Programs & Grantmaking
Wikimedia Foundation 
M: (+1) 415.513.6643
I work from San Francisco. My time zone is UTC -7/-8.



On Fri, Mar 24, 2023 at 2:02 AM Paulo Santos Perneta <
paulospern...@gmail.com> wrote:

> Yes, please, make this a regular event, at least for the time being.
> These discussions are incredibly useful, given the speed the developments
> are happening in this area, and the complexity of the challenges we are
> facing due to them.
> And thank's a lot for organizing the meeting yesterday!
>
> Paulo
>
> Samuel Klein  escreveu no dia quinta, 23/03/2023 à(s)
> 21:11:
>
>> The Bau lab (that produced ROME) is great; see their update MEMIT
>> https://memit.baulab.info scaling that approach.
>>
>> On Thu, Mar 23, 2023 at 3:43 PM Lauren Worden 
>> wrote:
>>
>>> On Thu, Mar 23, 2023 at 12:20 PM Samuel Klein  wrote:
>>>
 Thanks Yael and all for hosting this!  A great conversation which we
 should revisit regularly.

>>>
>>> Yes, I hope that this can be a (monthly?) regularly occurring event
>>> given the current state of very substantial advancements and improvements
>>> in the field.
>>>
>>> I want to reiterate some links which I feel may be of considerable help
>>> to those trying to understand our situation:
>>>
>>> RARR: https://arxiv.org/abs/2210.08726
>>>
>>> ROME: https://rome.baulab.info/
>>>
>>> ROME:
>>>
>>> -LW
>>> ___
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/ROUPXZQXNZSGXX5HKPLSUKIKZSR7LJT7/
>>> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>>
>>
>>
>> --
>> Samuel Klein  @metasj   w:user:sj  +1 617 529 4266
>> ___
>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IRPDSTNKLEWXE5RRVJHDKHL2OXZZXXN6/
>> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/LNKUGJT3XQEAJCDWNJU5QA6EIZHTHJGZ/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/EZOJF2VG4LX36SPLJ7PIQG3V4HAHRVRX/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-24 Thread Paulo Santos Perneta
Yes, please, make this a regular event, at least for the time being.
These discussions are incredibly useful, given the speed the developments
are happening in this area, and the complexity of the challenges we are
facing due to them.
And thank's a lot for organizing the meeting yesterday!

Paulo

Samuel Klein  escreveu no dia quinta, 23/03/2023 à(s)
21:11:

> The Bau lab (that produced ROME) is great; see their update MEMIT
> https://memit.baulab.info scaling that approach.
>
> On Thu, Mar 23, 2023 at 3:43 PM Lauren Worden 
> wrote:
>
>> On Thu, Mar 23, 2023 at 12:20 PM Samuel Klein  wrote:
>>
>>> Thanks Yael and all for hosting this!  A great conversation which we
>>> should revisit regularly.
>>>
>>
>> Yes, I hope that this can be a (monthly?) regularly occurring event given
>> the current state of very substantial advancements and improvements in the
>> field.
>>
>> I want to reiterate some links which I feel may be of considerable help
>> to those trying to understand our situation:
>>
>> RARR: https://arxiv.org/abs/2210.08726
>>
>> ROME: https://rome.baulab.info/
>>
>> ROME:
>>
>> -LW
>> ___
>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/ROUPXZQXNZSGXX5HKPLSUKIKZSR7LJT7/
>> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>
>
>
> --
> Samuel Klein  @metasj   w:user:sj  +1 617 529 4266
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IRPDSTNKLEWXE5RRVJHDKHL2OXZZXXN6/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/LNKUGJT3XQEAJCDWNJU5QA6EIZHTHJGZ/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-23 Thread Samuel Klein
The Bau lab (that produced ROME) is great; see their update MEMIT
https://memit.baulab.info scaling that approach.

On Thu, Mar 23, 2023 at 3:43 PM Lauren Worden 
wrote:

> On Thu, Mar 23, 2023 at 12:20 PM Samuel Klein  wrote:
>
>> Thanks Yael and all for hosting this!  A great conversation which we
>> should revisit regularly.
>>
>
> Yes, I hope that this can be a (monthly?) regularly occurring event given
> the current state of very substantial advancements and improvements in the
> field.
>
> I want to reiterate some links which I feel may be of considerable help to
> those trying to understand our situation:
>
> RARR: https://arxiv.org/abs/2210.08726
>
> ROME: https://rome.baulab.info/
>
> ROME:
>
> -LW
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/ROUPXZQXNZSGXX5HKPLSUKIKZSR7LJT7/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org



-- 
Samuel Klein  @metasj   w:user:sj  +1 617 529 4266
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IRPDSTNKLEWXE5RRVJHDKHL2OXZZXXN6/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-23 Thread Yael Weissburg
Thank you, all, for such a great conversation! I'd love to make this
something we do regularly, and wonder if there would be appetite for
rotating hosts? I softly nominate the Basque community to host the
next one!

One way or another, we'll find a way to make this more regular and will
come back to this thread with updates.

Thank you, all!

Yael
*Yael Weissburg* (she/her)
VP, Partnerships, Programs & Grantmaking
Wikimedia Foundation 
M: (+1) 415.513.6643
I work from San Francisco. My time zone is UTC -7/-8.



On Thu, Mar 23, 2023 at 12:43 PM Lauren Worden 
wrote:

> On Thu, Mar 23, 2023 at 12:20 PM Samuel Klein  wrote:
>
>> Thanks Yael and all for hosting this!  A great conversation which we
>> should revisit regularly.
>>
>
> Yes, I hope that this can be a (monthly?) regularly occurring event given
> the current state of very substantial advancements and improvements in the
> field.
>
> I want to reiterate some links which I feel may be of considerable help to
> those trying to understand our situation:
>
> RARR: https://arxiv.org/abs/2210.08726
>
> ROME: https://rome.baulab.info/
>
> ROME:
>
> -LW
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/ROUPXZQXNZSGXX5HKPLSUKIKZSR7LJT7/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/6OLMO4L5AXTQCVTFSCGQLQDN4J5YYWXR/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-23 Thread Lauren Worden
On Thu, Mar 23, 2023 at 12:20 PM Samuel Klein  wrote:

> Thanks Yael and all for hosting this!  A great conversation which we
> should revisit regularly.
>

Yes, I hope that this can be a (monthly?) regularly occurring event given
the current state of very substantial advancements and improvements in the
field.

I want to reiterate some links which I feel may be of considerable help to
those trying to understand our situation:

RARR: https://arxiv.org/abs/2210.08726

ROME: https://rome.baulab.info/

ROME:

-LW
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/ROUPXZQXNZSGXX5HKPLSUKIKZSR7LJT7/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-23 Thread Samuel Klein
Thanks Yael and all for hosting this!  A great conversation which we should
revisit regularly.

On Thu, Mar 9, 2023 at 4:40 PM Yael Weissburg 
wrote:

> Hi Everyone,
>
> Last year, as part of our annual planning process, the Wikimedia
> Foundation shared a list of external trends
> 
> that we believed were likely to significantly impact the context in which
> the Wikimedia movement operates. Our focus at the time was on the changing
> nature of search, the astronomical rise in the global demand for content,
> and rich media content in particular, and the concerning rise of
> misinformation and disinformation. We heard from many in our movement about
> additional trends that our movement faces that we didn’t include in that
> list, but that are critical to how we as a movement operate, including the
> de-prioritization of investigative journalism, and the damage to GLAM
> institutions wrought by the global pandemic.
>
> As part of this year’s annual planning process, we set out to update that
> list. In particular, we’ve been tracking recent advancements in artificial
> intelligence (AI). In our recent Diff post on the topic, [1] we noted some
> risks as well as some potential opportunities for our movement as this
> technology continues to evolve. Since there has been a great deal of
> interest in and discussion about AI products like ChatGPT and what it means
> for Wikimedia over the past few months (including several threads on the
> topic on this mailing list), we’d love to explore this topic in more depth
> with you and continue the conversation about its implications for us as a
> free knowledge movement.
>
> I’d like to invite you all to an open call on 23 March at 18:00 UTC (find
> your local time here) [2] where we can share reflections on the
> opportunities, risks, and questions we see raised by new AI tools and
> products.
>
> The call will be held on Zoom. If you’re interested in joining, email
> answ...@wikimedia.org and we will share the Zoom link with you via email.
> We will work to coordinate interpretation for languages where there are 3
> or more interested community members; please email answ...@wikimedia.org
> with interpretation requests as well.
>
> For those who are unable to join the call, but interested in following and
> contributing to the conversation, we plan to share notes on our External
> Trends Meta page [3] afterward so that you can add your thoughts.
>
> Whether in person or on-wiki, I hope you’ll share your ideas so that we
> can all get a broader understanding of the potential benefits and
> challenges of this emergent technology. Looking forward to the discussion!
>
> Best,
>
>
> Yael Weissburg
>
>1.
>
>
>
> https://diff.wikimedia.org/2023/02/17/looking-outward-external-trends-in-2023/
>2.
>
> https://zonestamp.toolforge.org/1679594401
>3.
>
>
>
> https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/Draft/External_Trends
>
>
> *Yael Weissburg* (she/her)
> VP, Partnerships, Programs & Grantmaking
> Wikimedia Foundation 
> M: (+1) 415.513.6643
> I work from San Francisco. My time zone is UTC -7/-8.
>
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BTIM7FBO3XATNOLL7OMAPCQQWC2DM45X/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org



-- 
Samuel Klein  @metasj   w:user:sj  +1 617 529 4266
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/LCLQPWHLUTBPNKJSEZTBNP34NK2GTX4C/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: 23 March: Invitation to Open Community Call on ChatGPT, generative AI, and Wikimedia

2023-03-21 Thread Elena Lappen
Hi everyone!

Reminder that this conversation is coming up on Thursday! We will have a
member of the Foundation's Legal team with us to discuss possible legal
implications, many of which have been raised on this list over the past few
days. You can still register for the Zoom room by emailing
answ...@wikimedia.org. Notes will be shared after on the External Trends
page on Meta

for those who want to participate asynchronously. Hope to see you there!

Best,
Elena

Elena Lappen (she/her/hers)

Lead Movement Communications Specialist

Wikimedia Foundation 





On Thu, Mar 9, 2023 at 1:40 PM Yael Weissburg 
wrote:

> Hi Everyone,
>
> Last year, as part of our annual planning process, the Wikimedia
> Foundation shared a list of external trends
> 
> that we believed were likely to significantly impact the context in which
> the Wikimedia movement operates. Our focus at the time was on the changing
> nature of search, the astronomical rise in the global demand for content,
> and rich media content in particular, and the concerning rise of
> misinformation and disinformation. We heard from many in our movement about
> additional trends that our movement faces that we didn’t include in that
> list, but that are critical to how we as a movement operate, including the
> de-prioritization of investigative journalism, and the damage to GLAM
> institutions wrought by the global pandemic.
>
> As part of this year’s annual planning process, we set out to update that
> list. In particular, we’ve been tracking recent advancements in artificial
> intelligence (AI). In our recent Diff post on the topic, [1] we noted some
> risks as well as some potential opportunities for our movement as this
> technology continues to evolve. Since there has been a great deal of
> interest in and discussion about AI products like ChatGPT and what it means
> for Wikimedia over the past few months (including several threads on the
> topic on this mailing list), we’d love to explore this topic in more depth
> with you and continue the conversation about its implications for us as a
> free knowledge movement.
>
> I’d like to invite you all to an open call on 23 March at 18:00 UTC (find
> your local time here) [2] where we can share reflections on the
> opportunities, risks, and questions we see raised by new AI tools and
> products.
>
> The call will be held on Zoom. If you’re interested in joining, email
> answ...@wikimedia.org and we will share the Zoom link with you via email.
> We will work to coordinate interpretation for languages where there are 3
> or more interested community members; please email answ...@wikimedia.org
> with interpretation requests as well.
>
> For those who are unable to join the call, but interested in following and
> contributing to the conversation, we plan to share notes on our External
> Trends Meta page [3] afterward so that you can add your thoughts.
>
> Whether in person or on-wiki, I hope you’ll share your ideas so that we
> can all get a broader understanding of the potential benefits and
> challenges of this emergent technology. Looking forward to the discussion!
>
> Best,
>
>
> Yael Weissburg
>
>1.
>
>
>
> https://diff.wikimedia.org/2023/02/17/looking-outward-external-trends-in-2023/
>2.
>
> https://zonestamp.toolforge.org/1679594401
>3.
>
>
>
> https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/Draft/External_Trends
>
>
> *Yael Weissburg* (she/her)
> VP, Partnerships, Programs & Grantmaking
> Wikimedia Foundation 
> M: (+1) 415.513.6643
> I work from San Francisco. My time zone is UTC -7/-8.
>
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BTIM7FBO3XATNOLL7OMAPCQQWC2DM45X/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/O2SE42Q3UMXUGWWEO2FXZGAOCARNGMGC/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org