[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Peter Southwood
“Not citing sources is probably a conscious design choice, as citing sources 
would mean sharing the sources used to train the language models” This may be a 
choice that comes back to bite them. Without citing their sources, they are 
unreliable as a source for anything one does not know already. Someone will 
have a bad consequence from relying on the information and will sue the 
publisher. It will be interesting to see how they plan to weasel their way out 
of legal responsibility while retaining any credibility. My guess is there will 
be a requirement to state that the information is AI generated and of entirely 
unknown and untested reliability. How soon to the first class action, I wonder. 
Lots of money for the lawyers. Cheers, Peter.

 

From: Subhashish [mailto:psubhash...@gmail.com] 
Sent: 05 February 2023 06:37
To: Wikimedia Mailing List
Subject: [Wikimedia-l] Re: Chat GPT

 

Just to clarify, my point was not about Getty to begin with. Whether Getty 
would win and whether a big corporation should own such a large amount of 
visual content are questions outside this particular thread. It would certainly 
be interesting to see how things roll.

 

But AI/ML is way more than just looking. Training with large models is a very 
sophisticated and technical process. Data annotation among many other forms of 
labour are done by real people. the article I had linked earlier tells a lot 
about the real world consequences of AI. I'm certain AI/ML, especially when 
we're talking about language models like ChatGPT, are far from innocent 
looking/reading. For starters, derivative of works, except Public Domain ones, 
must attribute the authors. Any provision for attribution is deliberately 
removed from systems like ChatGPT and that only gives corporations like OpenAI 
a free ride sans accountability.

 

Subhashish 

 

 

On Sat, Feb 4, 2023, 4:41 PM Todd Allen  wrote:

I'm not so sure Getty's got a case, though. If the images are on the Web, is 
using them to train an AI something copyright would cover? That to me seems 
more equivalent to just looking at the images, and there's no copyright problem 
in going to Getty's site and just looking at a bunch of their pictures.

 

But it will be interesting to see how that one shakes out.

 

Todd

 

On Sat, Feb 4, 2023 at 11:47 AM Subhashish  wrote:

Not citing sources is probably a conscious design choice, as citing sources 
would mean sharing the sources used to train the language models. Getty has 
just sued Stability AI, alleging the use of 12 million photographs without 
permission or compensation. Imagine if Stability had to purchase from Getty 
through a legal process. For starters, Getty might not have agreed in the first 
place. Bulk-scaping publicly visible text in text-based AIs like ChatGPT would 
mean scraping text with copyright. But even reusing CC BY-SA content would 
require attribution. None of the AI platforms attributes their sources because 
they did not acquire content in legal and ethical ways [1]. Large language 
models won't be large and releases won't happen fast if they actually start 
acquiring content gradually from trustworthy sources. It took so many years for 
hundreds and thousands of Wikimedians to take Wikipedias in different languages 
to where they are for a reason.

 

1. https://time.com/6247678/openai-chatgpt-kenya-workers/




Subhashish

 

 

On Sat, Feb 4, 2023 at 1:06 PM Peter Southwood  
wrote:

>From what I have seen the AIs are not great on citing sources. If they start 
>citing reliable sources, their contributions can be verified, or not. If they 
>produce verifiable, adequately sourced, well written information, are they a 
>problem or a solution?

Cheers,

Peter

 

From: Gnangarra [mailto:gnanga...@gmail.com] 
Sent: 04 February 2023 17:04
To: Wikimedia Mailing List
Subject: [Wikimedia-l] Re: Chat GPT

 

I see our biggest challenge is going to be detecting these AI tools adding 
content whether it's media or articles, along with identifying when they are in 
use by sources.  The failing of all new AI is not in its ability but in the 
lack of transparency with that being able to be identified by the readers. We 
have seen people impersonating musicians and writing songs in their style. We 
have also seen pictures that have been created by copying someone else's work 
yet not acknowledging it as being derivative of any kind.

 

Our big problems will be in ensuring that copyright is respected in legally, 
and not hosting anything that is even remotely dubious 

 

On Sat, 4 Feb 2023 at 22:24, Adam Sobieski  wrote:

Brainstorming on how to drive traffic to Wikimedia content from conversational 
media, UI/UX designers could provide menu items or buttons on chatbots' 
applications or webpage components (e.g., to read more about the content, to 
navigate to cited resources, to edit the content, to discuss the content, to 
upvote/downvote the content, to share the content or the recent dialogue 
history on 

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Subhashish
Just to clarify, my point was not about Getty to begin with. Whether Getty
would win and whether a big corporation should own such a large amount of
visual content are questions outside this particular thread. It would
certainly be interesting to see how things roll.

But AI/ML is way more than just looking. Training with large models is a
very sophisticated and technical process. Data annotation among many other
forms of labour are done by real people. the article I had linked earlier
tells a lot about the real world consequences of AI. I'm certain AI/ML,
especially when we're talking about language models like ChatGPT, are far
from innocent looking/reading. For starters, derivative of works, except
Public Domain ones, must attribute the authors. Any provision for
attribution is deliberately removed from systems like ChatGPT and that only
gives corporations like OpenAI a free ride sans accountability.

Subhashish


On Sat, Feb 4, 2023, 4:41 PM Todd Allen  wrote:

> I'm not so sure Getty's got a case, though. If the images are on the Web,
> is using them to train an AI something copyright would cover? That to me
> seems more equivalent to just looking at the images, and there's no
> copyright problem in going to Getty's site and just looking at a bunch of
> their pictures.
>
> But it will be interesting to see how that one shakes out.
>
> Todd
>
> On Sat, Feb 4, 2023 at 11:47 AM Subhashish  wrote:
>
>> Not citing sources is probably a conscious design choice, as citing
>> sources would mean sharing the sources used to train the language models.
>> Getty has just sued Stability AI, alleging the use of 12 million
>> photographs without permission or compensation. Imagine if Stability had to
>> purchase from Getty through a legal process. For starters, Getty might not
>> have agreed in the first place. Bulk-scaping publicly visible text in
>> text-based AIs like ChatGPT would mean scraping text with copyright. But
>> even reusing CC BY-SA content would require attribution. None of the AI
>> platforms attributes their sources because they did not acquire content in
>> legal and ethical ways [1]. Large language models won't be large and
>> releases won't happen fast if they actually start acquiring content
>> gradually from trustworthy sources. It took so many years for hundreds and
>> thousands of Wikimedians to take Wikipedias in different languages to where
>> they are for a reason.
>>
>> 1. https://time.com/6247678/openai-chatgpt-kenya-workers/
>>
>> Subhashish
>>
>>
>> On Sat, Feb 4, 2023 at 1:06 PM Peter Southwood <
>> peter.southw...@telkomsa.net> wrote:
>>
>>> From what I have seen the AIs are not great on citing sources. If they
>>> start citing reliable sources, their contributions can be verified, or not.
>>> If they produce verifiable, adequately sourced, well written information,
>>> are they a problem or a solution?
>>>
>>> Cheers,
>>>
>>> Peter
>>>
>>>
>>>
>>> *From:* Gnangarra [mailto:gnanga...@gmail.com]
>>> *Sent:* 04 February 2023 17:04
>>> *To:* Wikimedia Mailing List
>>> *Subject:* [Wikimedia-l] Re: Chat GPT
>>>
>>>
>>>
>>> I see our biggest challenge is going to be detecting these AI tools
>>> adding content whether it's media or articles, along with identifying when
>>> they are in use by sources.  The failing of all new AI is not in its
>>> ability but in the lack of transparency with that being able to be
>>> identified by the readers. We have seen people impersonating musicians and
>>> writing songs in their style. We have also seen pictures that have been
>>> created by copying someone else's work yet not acknowledging it as being
>>> derivative of any kind.
>>>
>>>
>>>
>>> Our big problems will be in ensuring that copyright is respected in
>>> legally, and not hosting anything that is even remotely dubious
>>>
>>>
>>>
>>> On Sat, 4 Feb 2023 at 22:24, Adam Sobieski 
>>> wrote:
>>>
>>> Brainstorming on how to drive traffic to Wikimedia content from
>>> conversational media, UI/UX designers could provide menu items or buttons
>>> on chatbots' applications or webpage components (e.g., to read more about
>>> the content, to navigate to cited resources, to edit the content, to
>>> discuss the content, to upvote/downvote the content, to share the content
>>> or the recent dialogue history on social media, to request
>>> review/moderation/curation for the content, etc.). Many of these envisioned
>>> menu items or buttons would operate contextually during dialogues, upon the
>>> most recent (or otherwise selected) responses provided by the chatbot or
>>> upon the recent transcripts. Some of these features could also be made
>>> available to end-users via spoken-language commands.
>>>
>>> At any point during hypertext-based dialogues, end-users would be able
>>> to navigate to Wikimedia content. These navigations could utilize either
>>> URL query string arguments or HTTP POST. In either case, bulk usage data,
>>> e.g., those dialogue contexts navigated from, could be useful.
>>>
>>> The 

[Wikimedia-l] New Signpost issue 4 February 2023

2023-02-04 Thread Andreas Kolbe
The Signpost – Volume 19, Issue 3 – 4 February 2023
--

>From the editor: New for the Signpost: Author pages, tag pages, and a
decent article search function
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/From_the_editor


News and notes: Foundation update on fundraising, new page patrol, Tides,
and Wikipedia blocked in Pakistan
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/News_and_notes


Section 230: Twenty-six words that created the internet, and the future of
an encyclopedia
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/Section_230


Disinformation report: Wikipedia on Santos
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/Disinformation_report


Special report: Legal status of Wikimedia projects "unclear" under
potential European legislation
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/Special_report


In the media: Furor over new Wikipedia skin, followup on Saudi bans, and
legislative debate
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/In_the_media


Op-Ed: Estonian businessman and political donor brings lawsuit against head
of national Wikimedia chapter
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/Op-Ed


Opinion: Study examines cultural leanings of Wikimedia projects' visual art
coverage
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/Opinion


Recent research: Wikipedia's "moderate yet systematic" liberal citation bias
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/Recent_research


WikiProject report: WikiProject Organized Labour
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/WikiProject_report


Tips and tricks: XTools: Data analytics for your list of created articles
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/Tips_and_tricks


Featured content: 20,000 Featureds under the Sea
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/Featured_content


Traffic report: Films, deaths and ChatGPT
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-02-04/Traffic_report



Single-page view

https://en.wikipedia.org/wiki/Wikipedia:Signpost/Single



https://facebook.com/wikisignpost

https://twitter.com/wikisignpost

https://wikis.world/@WikiSignpost
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/PUJZ6JHR5R3IEWHW3BU3KF5Q2O5GBJY5/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Todd Allen
I'm not so sure Getty's got a case, though. If the images are on the Web,
is using them to train an AI something copyright would cover? That to me
seems more equivalent to just looking at the images, and there's no
copyright problem in going to Getty's site and just looking at a bunch of
their pictures.

But it will be interesting to see how that one shakes out.

Todd

On Sat, Feb 4, 2023 at 11:47 AM Subhashish  wrote:

> Not citing sources is probably a conscious design choice, as citing
> sources would mean sharing the sources used to train the language models.
> Getty has just sued Stability AI, alleging the use of 12 million
> photographs without permission or compensation. Imagine if Stability had to
> purchase from Getty through a legal process. For starters, Getty might not
> have agreed in the first place. Bulk-scaping publicly visible text in
> text-based AIs like ChatGPT would mean scraping text with copyright. But
> even reusing CC BY-SA content would require attribution. None of the AI
> platforms attributes their sources because they did not acquire content in
> legal and ethical ways [1]. Large language models won't be large and
> releases won't happen fast if they actually start acquiring content
> gradually from trustworthy sources. It took so many years for hundreds and
> thousands of Wikimedians to take Wikipedias in different languages to where
> they are for a reason.
>
> 1. https://time.com/6247678/openai-chatgpt-kenya-workers/
>
> Subhashish
>
>
> On Sat, Feb 4, 2023 at 1:06 PM Peter Southwood <
> peter.southw...@telkomsa.net> wrote:
>
>> From what I have seen the AIs are not great on citing sources. If they
>> start citing reliable sources, their contributions can be verified, or not.
>> If they produce verifiable, adequately sourced, well written information,
>> are they a problem or a solution?
>>
>> Cheers,
>>
>> Peter
>>
>>
>>
>> *From:* Gnangarra [mailto:gnanga...@gmail.com]
>> *Sent:* 04 February 2023 17:04
>> *To:* Wikimedia Mailing List
>> *Subject:* [Wikimedia-l] Re: Chat GPT
>>
>>
>>
>> I see our biggest challenge is going to be detecting these AI tools
>> adding content whether it's media or articles, along with identifying when
>> they are in use by sources.  The failing of all new AI is not in its
>> ability but in the lack of transparency with that being able to be
>> identified by the readers. We have seen people impersonating musicians and
>> writing songs in their style. We have also seen pictures that have been
>> created by copying someone else's work yet not acknowledging it as being
>> derivative of any kind.
>>
>>
>>
>> Our big problems will be in ensuring that copyright is respected in
>> legally, and not hosting anything that is even remotely dubious
>>
>>
>>
>> On Sat, 4 Feb 2023 at 22:24, Adam Sobieski 
>> wrote:
>>
>> Brainstorming on how to drive traffic to Wikimedia content from
>> conversational media, UI/UX designers could provide menu items or buttons
>> on chatbots' applications or webpage components (e.g., to read more about
>> the content, to navigate to cited resources, to edit the content, to
>> discuss the content, to upvote/downvote the content, to share the content
>> or the recent dialogue history on social media, to request
>> review/moderation/curation for the content, etc.). Many of these envisioned
>> menu items or buttons would operate contextually during dialogues, upon the
>> most recent (or otherwise selected) responses provided by the chatbot or
>> upon the recent transcripts. Some of these features could also be made
>> available to end-users via spoken-language commands.
>>
>> At any point during hypertext-based dialogues, end-users would be able to
>> navigate to Wikimedia content. These navigations could utilize either URL
>> query string arguments or HTTP POST. In either case, bulk usage data, e.g.,
>> those dialogue contexts navigated from, could be useful.
>>
>> The capability to perform A/B testing across chatbots’ dialogues, over
>> large populations of end-users, could also be useful. In this way,
>> Wikimedia would be better able to: (1) measure end-user engagement and
>> satisfaction, (2) measure the quality of provided content, (3) perform
>> personalization, (4) retain readers and editors. A/B testing could be
>> performed by providing end-users with various feedback buttons (as
>> described above). A/B testing data could also be obtained through data
>> mining, analyzing end-users’ behaviors, response times, responses, and
>> dialogue moves. These data could be provided for the community at special
>> pages and could be made available per article, possibly by enhancing the
>> “Page information” system. One can also envision these kinds of analytics
>> data existing at the granularity of portions of, or selections of,
>> articles.
>>
>>
>>
>>
>>
>> Best regards,
>>
>> Adam
>>
>>
>> --
>>
>> *From:* Victoria Coleman 
>> *Sent:* Saturday, February 4, 2023 8:10 AM
>> *To:* Wikimedia Mailing List 

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Subhashish
Not citing sources is probably a conscious design choice, as citing sources
would mean sharing the sources used to train the language models. Getty has
just sued Stability AI, alleging the use of 12 million photographs without
permission or compensation. Imagine if Stability had to purchase from Getty
through a legal process. For starters, Getty might not have agreed in the
first place. Bulk-scaping publicly visible text in text-based AIs like
ChatGPT would mean scraping text with copyright. But even reusing CC BY-SA
content would require attribution. None of the AI platforms attributes
their sources because they did not acquire content in legal and ethical
ways [1]. Large language models won't be large and releases won't happen
fast if they actually start acquiring content gradually from
trustworthy sources. It took so many years for hundreds and thousands of
Wikimedians to take Wikipedias in different languages to where they are for
a reason.

1. https://time.com/6247678/openai-chatgpt-kenya-workers/

Subhashish


On Sat, Feb 4, 2023 at 1:06 PM Peter Southwood 
wrote:

> From what I have seen the AIs are not great on citing sources. If they
> start citing reliable sources, their contributions can be verified, or not.
> If they produce verifiable, adequately sourced, well written information,
> are they a problem or a solution?
>
> Cheers,
>
> Peter
>
>
>
> *From:* Gnangarra [mailto:gnanga...@gmail.com]
> *Sent:* 04 February 2023 17:04
> *To:* Wikimedia Mailing List
> *Subject:* [Wikimedia-l] Re: Chat GPT
>
>
>
> I see our biggest challenge is going to be detecting these AI tools adding
> content whether it's media or articles, along with identifying when they
> are in use by sources.  The failing of all new AI is not in its ability but
> in the lack of transparency with that being able to be identified by the
> readers. We have seen people impersonating musicians and writing songs in
> their style. We have also seen pictures that have been created by copying
> someone else's work yet not acknowledging it as being derivative of any
> kind.
>
>
>
> Our big problems will be in ensuring that copyright is respected in
> legally, and not hosting anything that is even remotely dubious
>
>
>
> On Sat, 4 Feb 2023 at 22:24, Adam Sobieski 
> wrote:
>
> Brainstorming on how to drive traffic to Wikimedia content from
> conversational media, UI/UX designers could provide menu items or buttons
> on chatbots' applications or webpage components (e.g., to read more about
> the content, to navigate to cited resources, to edit the content, to
> discuss the content, to upvote/downvote the content, to share the content
> or the recent dialogue history on social media, to request
> review/moderation/curation for the content, etc.). Many of these envisioned
> menu items or buttons would operate contextually during dialogues, upon the
> most recent (or otherwise selected) responses provided by the chatbot or
> upon the recent transcripts. Some of these features could also be made
> available to end-users via spoken-language commands.
>
> At any point during hypertext-based dialogues, end-users would be able to
> navigate to Wikimedia content. These navigations could utilize either URL
> query string arguments or HTTP POST. In either case, bulk usage data, e.g.,
> those dialogue contexts navigated from, could be useful.
>
> The capability to perform A/B testing across chatbots’ dialogues, over
> large populations of end-users, could also be useful. In this way,
> Wikimedia would be better able to: (1) measure end-user engagement and
> satisfaction, (2) measure the quality of provided content, (3) perform
> personalization, (4) retain readers and editors. A/B testing could be
> performed by providing end-users with various feedback buttons (as
> described above). A/B testing data could also be obtained through data
> mining, analyzing end-users’ behaviors, response times, responses, and
> dialogue moves. These data could be provided for the community at special
> pages and could be made available per article, possibly by enhancing the
> “Page information” system. One can also envision these kinds of analytics
> data existing at the granularity of portions of, or selections of,
> articles.
>
>
>
>
>
> Best regards,
>
> Adam
>
>
> --
>
> *From:* Victoria Coleman 
> *Sent:* Saturday, February 4, 2023 8:10 AM
> *To:* Wikimedia Mailing List 
> *Subject:* [Wikimedia-l] Re: Chat GPT
>
>
>
> Hi Christophe,
>
>
>
> I had not thought about the threat to Wikipedia traffic from Chat GPT but
> you have a good point. The success of the projects is always one step away
> from the next big disruption. So the WMF as the tech provider for the
> mission (because first and foremost in my view that?s what the WMF is - as
> well as the financial engine of the movement of course) needs to pay
> attention and experiment to maintain the long term viability of the
> mission. In fact I think the cluster of our projects offers 

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Peter Southwood
>From what I have seen the AIs are not great on citing sources. If they start 
>citing reliable sources, their contributions can be verified, or not. If they 
>produce verifiable, adequately sourced, well written information, are they a 
>problem or a solution?

Cheers,

Peter

 

From: Gnangarra [mailto:gnanga...@gmail.com] 
Sent: 04 February 2023 17:04
To: Wikimedia Mailing List
Subject: [Wikimedia-l] Re: Chat GPT

 

I see our biggest challenge is going to be detecting these AI tools adding 
content whether it's media or articles, along with identifying when they are in 
use by sources.  The failing of all new AI is not in its ability but in the 
lack of transparency with that being able to be identified by the readers. We 
have seen people impersonating musicians and writing songs in their style. We 
have also seen pictures that have been created by copying someone else's work 
yet not acknowledging it as being derivative of any kind.

 

Our big problems will be in ensuring that copyright is respected in legally, 
and not hosting anything that is even remotely dubious 

 

On Sat, 4 Feb 2023 at 22:24, Adam Sobieski  wrote:

Brainstorming on how to drive traffic to Wikimedia content from conversational 
media, UI/UX designers could provide menu items or buttons on chatbots' 
applications or webpage components (e.g., to read more about the content, to 
navigate to cited resources, to edit the content, to discuss the content, to 
upvote/downvote the content, to share the content or the recent dialogue 
history on social media, to request review/moderation/curation for the content, 
etc.). Many of these envisioned menu items or buttons would operate 
contextually during dialogues, upon the most recent (or otherwise selected) 
responses provided by the chatbot or upon the recent transcripts. Some of these 
features could also be made available to end-users via spoken-language commands.

At any point during hypertext-based dialogues, end-users would be able to 
navigate to Wikimedia content. These navigations could utilize either URL query 
string arguments or HTTP POST. In either case, bulk usage data, e.g., those 
dialogue contexts navigated from, could be useful. 

The capability to perform A/B testing across chatbots’ dialogues, over large 
populations of end-users, could also be useful. In this way, Wikimedia would be 
better able to: (1) measure end-user engagement and satisfaction, (2) measure 
the quality of provided content, (3) perform personalization, (4) retain 
readers and editors. A/B testing could be performed by providing end-users with 
various feedback buttons (as described above). A/B testing data could also be 
obtained through data mining, analyzing end-users’ behaviors, response times, 
responses, and dialogue moves. These data could be provided for the community 
at special pages and could be made available per article, possibly by enhancing 
the “Page information” system. One can also envision these kinds of analytics 
data existing at the granularity of portions of, or selections of, articles. 

 

 

Best regards,

Adam

 

  _  

From: Victoria Coleman 
Sent: Saturday, February 4, 2023 8:10 AM
To: Wikimedia Mailing List 
Subject: [Wikimedia-l] Re: Chat GPT 

 

Hi Christophe, 

 

I had not thought about the threat to Wikipedia traffic from Chat GPT but you 
have a good point. The success of the projects is always one step away from the 
next big disruption. So the WMF as the tech provider for the mission (because 
first and foremost in my view that?s what the WMF is - as well as the financial 
engine of the movement of course) needs to pay attention and experiment to 
maintain the long term viability of the mission. In fact I think the cluster of 
our projects offers compelling options. For example to your point below on data 
sets, we have the amazing Wikidata as well the excellent work on abstract 
Wikipedia. We have Wikipedia Enterprise which has built some avenues of 
collaboration with big tech. A bold vision is needed to bring all of it 
together and build an MVP for the community to experiment with.

Best regards, 

 

Victoria Coleman





On Feb 4, 2023, at 4:14 AM, Christophe Henner  
wrote:

?Hi, 

 

On the product side, NLP based AI biggest concern to me is that it would 
drastically decrease traffic to our websites/apps. Which means less new editors 
ans less donations. 

 

So first from a strictly positioning perspective, we have here a major change 
that needs to be managed.

 

And to be honest, it will come faster than we think. We are perfectionists, I 
can assure you, most companies would be happy to launch a search product with a 
80% confidence in answers quality.

 

>From a financial perspective, large industrial investment like this are 
>usually a pool of money you can draw from in x years. You can expect they did 
>not draw all of it yet.

 

Second, GPT 3 and ChatGPT are far from being the most expensive products they 
have. On top of people you 

[Wikimedia-l] Re: Wikimedia UK Strategic Report for 21/22

2023-02-04 Thread Wasi
Awesome!!
Very unique way to publish a report.
Wasi

On Thu, Feb 2, 2023 at 5:03 PM Lucy Crompton-Reid <
lucy.crompton-r...@wikimedia.org.uk> wrote:

> Dear all
>
>
> Wikimedia UK has just published our Strategic Report for the financial
> year 2021/22. Excitingly, this is our first fully digital format, which
> we hope brings to life the breadth of Wikimedia UK’s activities and impact.
>
>
> You can find the report directly here
>  or on our website
>  under Strategic Report.
>
>
> Best wishes
>
> Lucy
>
>
> --
> Lucy Crompton-Reid (she/her)
> Chief Executive
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/D3ETOIQRVDYKA6RCUXL4ITLGOCNDCCV6/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/ZAELNCTYGLVVTNHEWHK46JMJIC6VNBV4/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: Wikimedia-l Digest, Vol 724, Issue 1

2023-02-04 Thread Lucy Crompton-Reid
Thanks very much Andreas and João!

Mike, there is quite a lot of text and additional case studies and images.
You need to click into the different sections to see that. I've attached a
few screenshots which hopefully illuminate this a bit :)

Best
Lucy



On Fri, 3 Feb 2023 at 16:23, 
wrote:

> Send Wikimedia-l mailing list submissions to
> wikimedia-l@lists.wikimedia.org
>
> To subscribe or unsubscribe, please visit
>
> https://lists.wikimedia.org/postorius/lists/wikimedia-l.lists.wikimedia.org/
>
> You can reach the person managing the list at
> wikimedia-l-ow...@lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikimedia-l digest..."
>
> Today's Topics:
>
>1. Re: Wikimedia UK Strategic Report for 21/22 (Andreas Kolbe)
>2. Re: Wikimedia UK Strategic Report for 21/22
>   (João Alexandre Peschanski)
>3. Re: Wikimedia UK Strategic Report for 21/22 (Mike Peel)
>
>
> --
>
> Message: 1
> Date: Fri, 3 Feb 2023 16:13:11 +
> From: Andreas Kolbe 
> Subject: [Wikimedia-l] Re: Wikimedia UK Strategic Report for 21/22
> To: Wikimedia Mailing List 
> Message-ID:
>  nti6vgoaws3tirq+vb1d4woahabm...@mail.gmail.com>
> Content-Type: multipart/alternative;
> boundary="ea243b05f3cdf5ed"
>
> Wow, what a gorgeously designed report!
>
> Beautiful to look at. Love it.
>
> Andreas
>
>
> On Thu, Feb 2, 2023 at 11:03 AM Lucy Crompton-Reid <
> lucy.crompton-r...@wikimedia.org.uk> wrote:
>
> > Dear all
> >
> >
> > Wikimedia UK has just published our Strategic Report for the financial
> > year 2021/22. Excitingly, this is our first fully digital format, which
> > we hope brings to life the breadth of Wikimedia UK’s activities and
> impact.
> >
> >
> > You can find the report directly here
> >  or on our website
> >  under Strategic Report.
> >
> >
> > Best wishes
> >
> > Lucy
> >
> >
> > --
> > Lucy Crompton-Reid (she/her)
> > Chief Executive
> > ___
> > Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> > at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> > https://meta.wikimedia.org/wiki/Wikimedia-l
> > Public archives at
> >
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/D3ETOIQRVDYKA6RCUXL4ITLGOCNDCCV6/
> > To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
> -- next part --
> A message part incompatible with plain text digests has been removed ...
> Name: not available
> Type: text/html
> Size: 4545 bytes
> Desc: not available
>
> --
>
> Message: 2
> Date: Fri, 3 Feb 2023 13:19:55 -0300
> From: João Alexandre Peschanski 
> Subject: [Wikimedia-l] Re: Wikimedia UK Strategic Report for 21/22
> To: Wikimedia Mailing List 
> Message-ID:
>  wnqyt+fc12ro_mqvsw03p34lcmt5r...@mail.gmail.com>
> Content-Type: multipart/alternative;
> boundary="ed7abd05f3ce0d69"
>
> Indeed, what a beautiful material. Very inspiring! Congratulations WMUK :)
> Best, João
>
> Em sex., 3 de fev. de 2023 às 13:14, Andreas Kolbe 
> escreveu:
>
> > Wow, what a gorgeously designed report!
> >
> > Beautiful to look at. Love it.
> >
> > Andreas
> >
> >
> > On Thu, Feb 2, 2023 at 11:03 AM Lucy Crompton-Reid <
> > lucy.crompton-r...@wikimedia.org.uk> wrote:
> >
> >> Dear all
> >>
> >>
> >> Wikimedia UK has just published our Strategic Report for the financial
> >> year 2021/22. Excitingly, this is our first fully digital format, which
> >> we hope brings to life the breadth of Wikimedia UK’s activities and
> impact.
> >>
> >>
> >> You can find the report directly here
> >>  or on our website
> >>  under Strategic Report.
> >>
> >>
> >> Best wishes
> >>
> >> Lucy
> >>
> >>
> >> --
> >> Lucy Crompton-Reid (she/her)
> >> Chief Executive
> >> ___
> >> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> >> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> >> https://meta.wikimedia.org/wiki/Wikimedia-l
> >> Public archives at
> >>
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/D3ETOIQRVDYKA6RCUXL4ITLGOCNDCCV6/
> >> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
> >
> > ___
> > Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> > at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> > https://meta.wikimedia.org/wiki/Wikimedia-l
> > Public archives at
> >
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/DKF2OEGGI3HH64JRJMSJ2E5K6DMMZGMF/
> > To unsubscribe send an 

[Wikimedia-l] Re: Block of Wikipedia in Pakistan

2023-02-04 Thread John S.
Thank you for the update. Please continue to keep us informed.

John S.

Den lör 4 feb. 2023 kl 02:45 skrev Stephen LaPorte :

> Hello everyone,
>
> Today, the Pakistan Telecommunication Authority (PTA) ordered that access
> to Wikipedia be suspended in Pakistan. We are urging the Pakistan
> government to restore access to Wikipedia and Wikimedia projects
> immediately
> 
> .
>
> This action denies the fifth most populous nation in the world access to
> the world's largest, free online knowledge repository. If it continues, it
> will also deny the world the perspective of the people of Pakistan and the
> benefit of their knowledge, history, and culture.
>
> The Wikimedia Foundation received a notification from the Pakistan
> Telecommunication Authority on February 1, 2023, stating “the services of
> Wikipedia have been degraded for 48 hours” for failure to remove content
> from the site deemed “unlawful” by the government. The notification further
> mentioned that a block of Wikipedia could follow, if the Foundation did not
> comply with the takedown orders. As of Friday February 3, our internal
> traffic reports indicate that Wikipedia and Wikimedia projects are no
> longer accessible to users in Pakistan.
>
> The Wikimedia Foundation is already examining various avenues and
> investigating how we can help restore access, while staying true to our
> values of verifiability, neutrality, and freedom of information.
>
> We are also prepared to support any members of the Wikimedia communities
> who are impacted. If you or someone you know is contacted by the Pakistani
> government in reference to the block, please contact us at
> le...@wikimedia.org. If you or someone you know is in immediate physical
> danger in reference to the block, please contact emerge...@wikimedia.org
> right away. We are actively working to reach out to community leaders in
> the area.
>
> For over twenty years, our movement has supported knowledge as a
> fundamental human right. In defense of this right, we have opposed a
> growing number of threats that would interfere with the ability of people
> to access and contribute to free knowledge. We know that many of you will
> want to take action or speak out against the block. For now, please
> continue to do what is needed to remain safe. We will keep you updated on
> any new developments, actions we are taking, and ways which you can help
> return access to Wikipedia and Wikimedia projects in Pakistan.
>
> Thank you,
> Stephen
>
> --
> Stephen LaPorte (he/him/his)
> General Counsel
> Wikimedia Foundation
>
>
> *NOTICE: As an attorney for the Wikimedia Foundation, for legal and
> ethical reasons, I cannot give legal advice to, or serve as a lawyer for,
> community members, volunteers, or staff members in their personal capacity.
> For more on what this means, please see our legal disclaimer
> .*
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BBL2KGUMFHNVNHSA4UINFMLIVPO6GQB5/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/LWQZQWNCFWS7K47U4RJLBAEFTRKH5SDN/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Mustafa Kabir
mk0705...@gmail.com

On Sat, Feb 4, 2023, 8:24 PM Adam Sobieski  wrote:

> Brainstorming on how to drive traffic to Wikimedia content from
> conversational media, UI/UX designers could provide menu items or buttons
> on chatbots' applications or webpage components (e.g., to read more about
> the content, to navigate to cited resources, to edit the content, to
> discuss the content, to upvote/downvote the content, to share the content
> or the recent dialogue history on social media, to request
> review/moderation/curation for the content, etc.). Many of these
> envisioned menu items or buttons would operate contextually during
> dialogues, upon the most recent (or otherwise selected) responses provided
> by the chatbot or upon the recent transcripts. Some of these features
> could also be made available to end-users via spoken-language commands.
>
> At any point during hypertext-based dialogues, end-users would be able to
> navigate to Wikimedia content. These navigations could utilize either URL
> query string arguments or HTTP POST. In either case, bulk usage data, e.g.,
> those dialogue contexts navigated from, could be useful.
>
> The capability to perform A/B testing across chatbots’ dialogues, over
> large populations of end-users, could also be useful. In this way,
> Wikimedia would be better able to: (1) measure end-user engagement and
> satisfaction, (2) measure the quality of provided content, (3) perform
> personalization, (4) retain readers and editors. A/B testing could be
> performed by providing end-users with various feedback buttons (as
> described above). A/B testing data could also be obtained through data
> mining, analyzing end-users’ behaviors, response times, responses, and
> dialogue moves. These data could be provided for the community at special
> pages and could be made available per article, possibly by enhancing the
> “Page information” system. One can also envision these kinds of analytics
> data existing at the granularity of portions of, or selections of,
> articles.
>
>
>
> Best regards,
>
> Adam
>
> --
> *From:* Victoria Coleman 
> *Sent:* Saturday, February 4, 2023 8:10 AM
> *To:* Wikimedia Mailing List 
> *Subject:* [Wikimedia-l] Re: Chat GPT
>
> Hi Christophe,
>
> I had not thought about the threat to Wikipedia traffic from Chat GPT but
> you have a good point. The success of the projects is always one step away
> from the next big disruption. So the WMF as the tech provider for the
> mission (because first and foremost in my view that?s what the WMF is - as
> well as the financial engine of the movement of course) needs to pay
> attention and experiment to maintain the long term viability of the
> mission. In fact I think the cluster of our projects offers compelling
> options. For example to your point below on data sets, we have the amazing
> Wikidata as well the excellent work on abstract Wikipedia. We have
> Wikipedia Enterprise which has built some avenues of collaboration with big
> tech. A bold vision is needed to bring all of it together and build an MVP
> for the community to experiment with.
>
> Best regards,
>
> Victoria Coleman
>
> On Feb 4, 2023, at 4:14 AM, Christophe Henner 
> wrote:
>
> ?Hi,
>
> On the product side, NLP based AI biggest concern to me is that it would
> drastically decrease traffic to our websites/apps. Which means less new
> editors ans less donations.
>
> So first from a strictly positioning perspective, we have here a major
> change that needs to be managed.
>
> And to be honest, it will come faster than we think. We are
> perfectionists, I can assure you, most companies would be happy to launch a
> search product with a 80% confidence in answers quality.
>
> From a financial perspective, large industrial investment like this are
> usually a pool of money you can draw from in x years. You can expect they
> did not draw all of it yet.
>
> Second, GPT 3 and ChatGPT are far from being the most expensive products
> they have. On top of people you need:
> * datasets
> * people to tag the dataset
> * people to correct the algo
> * computing power
>
> I simplify here, but we already have the capacity to muster some of that,
> which drastically lowers our costs :)
>
> I would not discard the option of the movement doing it so easily. That
> being said, it would mean a new project with the need of substantial
> ressources.
>
> Sent from my iPhone
>
> On Feb 4, 2023, at 9:30 AM, Adam Sobieski 
> wrote:
>
> ?
> With respect to cloud computing costs, these being a significant component
> of the costs to train and operate modern AI systems, as a non-profit
> organization, the Wikimedia Foundation might be interested in the National
> Research Cloud (NRC) policy proposal:
> https://hai.stanford.edu/policy/national-research-cloud .
>
> "Artificial intelligence requires vast amounts of computing power, data,
> and expertise to train and deploy the massive machine learning models
> behind the most advanced research. But 

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Kimmo Virtanen
Hi,

I think the Wikimedia community is generally well-positioned to create
high-quality training data for machine learning models. So,  improving the
crowdsourcing of wikidata and structured data is essential for making this
easy-to-use curated training data. So, the focus should be on making data
to be more widely used and using existing open-source NLP/NLU models from
organizations such as universitios, EleutherAI, Facebook etc rather than
developing new models from scratch by ourselves.

The bottleneck of utilizing these is the need for more human skills, which
can be addressed through documentation and examples demonstrating how to
use machine learning tools in real-life use cases such as image
classification, description/summary generation or automated error testing.
It would be essential also to develop ML tools that can be run on commodity
hardware, such as GPUs:s with 24GB RAM currently, for broader
accessibility. These could run on people's computers, home labs, and
hacklabs. It would also direct our development in the direction of less
resource-intensive ML tools.

Br,
-- Kimmo Virtanen, Zache

On Sat, Feb 4, 2023 at 1:16 PM Kimmo Virtanen 
wrote:

> Hi,
>
> I think the Wikimedia community is generally well-positioned to create
> high-quality training data for machine learning models. So,  improving the
> crowdsourcing of wikidata and structured data is essential for making this
> easy-to-use curated training data. So, the focus should be on making data
> to be more widely used and using existing open-source NLP/NLU models from
> organizations such as universitios, EleutherAI, Facebook etc rather than
> developing new models from scratch by ourselves.
>
> The bottleneck of utilizing these is the need for more human skills, which
> can be addressed through documentation and examples demonstrating how to
> use machine learning tools in real-life use cases such as image
> classification, description/summary generation or automated error testing.
> It would be essential also to develop ML tools that can be run on commodity
> hardware, such as GPUs:s with 24GB RAM currently, for broader
> accessibility. These could run on people's computers, home labs, and
> hacklabs. It would also direct our development in the direction of less
> resource-intensive ML tools.
>
> Br,
> -- Kimmo Virtanen, Zache
>
>
>
>
>
>
>
> On Sat, Feb 4, 2023 at 12:23 PM Christophe Henner <
> christophe.hen...@gmail.com> wrote:
>
>> Hi,
>>
>> On the product side, NLP based AI biggest concern to me is that it would
>> drastically decrease traffic to our websites/apps. Which means less new
>> editors ans less donations.
>>
>> So first from a strictly positioning perspective, we have here a major
>> change that needs to be managed.
>>
>> And to be honest, it will come faster than we think. We are
>> perfectionists, I can assure you, most companies would be happy to launch a
>> search product with a 80% confidence in answers quality.
>>
>> From a financial perspective, large industrial investment like this are
>> usually a pool of money you can draw from in x years. You can expect they
>> did not draw all of it yet.
>>
>> Second, GPT 3 and ChatGPT are far from being the most expensive products
>> they have. On top of people you need:
>> * datasets
>> * people to tag the dataset
>> * people to correct the algo
>> * computing power
>>
>> I simplify here, but we already have the capacity to muster some of that,
>> which drastically lowers our costs :)
>>
>> I would not discard the option of the movement doing it so easily. That
>> being said, it would mean a new project with the need of substantial
>> ressources.
>>
>> Sent from my iPhone
>>
>> On Feb 4, 2023, at 9:30 AM, Adam Sobieski 
>> wrote:
>>
>> ?
>> With respect to cloud computing costs, these being a significant
>> component of the costs to train and operate modern AI systems, as a
>> non-profit organization, the Wikimedia Foundation might be interested in
>> the National Research Cloud (NRC) policy proposal:
>> https://hai.stanford.edu/policy/national-research-cloud .
>>
>> "Artificial intelligence requires vast amounts of computing power, data,
>> and expertise to train and deploy the massive machine learning models
>> behind the most advanced research. But access is increasingly out of reach
>> for most colleges and universities. A National Research Cloud (NRC) would
>> provide academic and *non-profit researchers* with the compute power and
>> government datasets needed for education and research. By democratizing
>> access and equity for all colleges and universities, an NRC has the
>> potential not only to unleash a string of advancements in AI, but to help
>> ensure the U.S. maintains its leadership and competitiveness on the global
>> stage.
>>
>> "Throughout 2020, Stanford HAI led efforts with 22 top computer science
>> universities along with a bipartisan, bicameral group of lawmakers
>> proposing legislation to bring the NRC to fruition. On January 1, 2021, the
>> 

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Mustafa Kabir
mk0705...@gmail.com

On Sat, Feb 4, 2023, 1:01 PM Steven Walling 
wrote:

>
>
> On Fri, Feb 3, 2023 at 9:47 PM Gergő Tisza  wrote:
>
>> Just to give a sense of scale: OpenAI started with a $1 billion donation,
>> got another $1B as investment, and is now getting a larger investment from
>> Microsoft (undisclosed but rumored to be $10B). Assuming they spent most of
>> their previous funding, which seems likely, their operational costs are in
>> the ballpark of $300 million per year. The idea that the WMF could just
>> choose to create conversational software of a similar quality if it wanted
>> seems detached from reality to me.
>>
>
> Without spending billions on LLM development to aim for a
> conversational chatbot trying to pass a Turing test, we could definitely
> try to catch up to the state of the art in search results. Our search
> currently does a pretty bad job (in terms of recall especially). Today's
> featured article in English is the Hot Chip album "Made in the Dark", and
> if I enter anything but the exact article title the typeahead results are
> woefully incomplete or wrong. If I ask an actual question, good luck.
>
> Google is feeling vulnerable to OpenAI here in part because everyone can
> see that their results are often full of low quality junk created for SEO,
> while ChatGPT just gives a concise answer right there.
>
> https://en.wikipedia.org/wiki/The_Menu_(2022_film) is one of the top
> viewed English articles. If I search "The Menu reviews" the Google results
> are noisy and not so great. ChatGPT actually gives you nothing relevant
> because it doesn't know anything from 2022. If we could just manage to
> display the three sentence snippet of our article about the critical
> response section of the article, it would be awesome. It's too bad that the
> whole "knowledge engine" debacle poisoned the well when it comes to a
> Wikipedia search engine, because we could definitely do a lot to learn from
> what people like about ChatGPT and apply to Wikipedia search.
>
> ___
>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/6OBPB7WNHKJQXXIBCK73SDXLE3DMGNMY/
>> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/2O5USM4UIGYO6Y4LAD26SGM5AFMHYQFP/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/PR2KF3Z5OGNGKVGAYXYBTK2R6PY3WNEN/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Gnangarra
I see our biggest challenge is going to be detecting these AI tools adding
content whether it's media or articles, along with identifying when they
are in use by sources.  The failing of all new AI is not in its ability but
in the lack of transparency with that being able to be identified by the
readers. We have seen people impersonating musicians and writing songs in
their style. We have also seen pictures that have been created by copying
someone else's work yet not acknowledging it as being derivative of any
kind.

Our big problems will be in ensuring that copyright is respected in
legally, and not hosting anything that is even remotely dubious

On Sat, 4 Feb 2023 at 22:24, Adam Sobieski  wrote:

> Brainstorming on how to drive traffic to Wikimedia content from
> conversational media, UI/UX designers could provide menu items or buttons
> on chatbots' applications or webpage components (e.g., to read more about
> the content, to navigate to cited resources, to edit the content, to
> discuss the content, to upvote/downvote the content, to share the content
> or the recent dialogue history on social media, to request
> review/moderation/curation for the content, etc.). Many of these
> envisioned menu items or buttons would operate contextually during
> dialogues, upon the most recent (or otherwise selected) responses provided
> by the chatbot or upon the recent transcripts. Some of these features
> could also be made available to end-users via spoken-language commands.
>
> At any point during hypertext-based dialogues, end-users would be able to
> navigate to Wikimedia content. These navigations could utilize either URL
> query string arguments or HTTP POST. In either case, bulk usage data, e.g.,
> those dialogue contexts navigated from, could be useful.
>
> The capability to perform A/B testing across chatbots’ dialogues, over
> large populations of end-users, could also be useful. In this way,
> Wikimedia would be better able to: (1) measure end-user engagement and
> satisfaction, (2) measure the quality of provided content, (3) perform
> personalization, (4) retain readers and editors. A/B testing could be
> performed by providing end-users with various feedback buttons (as
> described above). A/B testing data could also be obtained through data
> mining, analyzing end-users’ behaviors, response times, responses, and
> dialogue moves. These data could be provided for the community at special
> pages and could be made available per article, possibly by enhancing the
> “Page information” system. One can also envision these kinds of analytics
> data existing at the granularity of portions of, or selections of,
> articles.
>
>
>
> Best regards,
>
> Adam
>
> --
> *From:* Victoria Coleman 
> *Sent:* Saturday, February 4, 2023 8:10 AM
> *To:* Wikimedia Mailing List 
> *Subject:* [Wikimedia-l] Re: Chat GPT
>
> Hi Christophe,
>
> I had not thought about the threat to Wikipedia traffic from Chat GPT but
> you have a good point. The success of the projects is always one step away
> from the next big disruption. So the WMF as the tech provider for the
> mission (because first and foremost in my view that?s what the WMF is - as
> well as the financial engine of the movement of course) needs to pay
> attention and experiment to maintain the long term viability of the
> mission. In fact I think the cluster of our projects offers compelling
> options. For example to your point below on data sets, we have the amazing
> Wikidata as well the excellent work on abstract Wikipedia. We have
> Wikipedia Enterprise which has built some avenues of collaboration with big
> tech. A bold vision is needed to bring all of it together and build an MVP
> for the community to experiment with.
>
> Best regards,
>
> Victoria Coleman
>
> On Feb 4, 2023, at 4:14 AM, Christophe Henner 
> wrote:
>
> ?Hi,
>
> On the product side, NLP based AI biggest concern to me is that it would
> drastically decrease traffic to our websites/apps. Which means less new
> editors ans less donations.
>
> So first from a strictly positioning perspective, we have here a major
> change that needs to be managed.
>
> And to be honest, it will come faster than we think. We are
> perfectionists, I can assure you, most companies would be happy to launch a
> search product with a 80% confidence in answers quality.
>
> From a financial perspective, large industrial investment like this are
> usually a pool of money you can draw from in x years. You can expect they
> did not draw all of it yet.
>
> Second, GPT 3 and ChatGPT are far from being the most expensive products
> they have. On top of people you need:
> * datasets
> * people to tag the dataset
> * people to correct the algo
> * computing power
>
> I simplify here, but we already have the capacity to muster some of that,
> which drastically lowers our costs :)
>
> I would not discard the option of the movement doing it so easily. That
> being said, it would mean a new project with the 

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Adam Sobieski
Brainstorming on how to drive traffic to Wikimedia content from conversational 
media, UI/UX designers could provide menu items or buttons on chatbots' 
applications or webpage components (e.g., to read more about the content, to 
navigate to cited resources, to edit the content, to discuss the content, to 
upvote/downvote the content, to share the content or the recent dialogue 
history on social media, to request review/moderation/curation for the content, 
etc.). Many of these envisioned menu items or buttons would operate 
contextually during dialogues, upon the most recent (or otherwise selected) 
responses provided by the chatbot or upon the recent transcripts. Some of these 
features could also be made available to end-users via spoken-language commands.

At any point during hypertext-based dialogues, end-users would be able to 
navigate to Wikimedia content. These navigations could utilize either URL query 
string arguments or HTTP POST. In either case, bulk usage data, e.g., those 
dialogue contexts navigated from, could be useful.

The capability to perform A/B testing across chatbots’ dialogues, over large 
populations of end-users, could also be useful. In this way, Wikimedia would be 
better able to: (1) measure end-user engagement and satisfaction, (2) measure 
the quality of provided content, (3) perform personalization, (4) retain 
readers and editors. A/B testing could be performed by providing end-users with 
various feedback buttons (as described above). A/B testing data could also be 
obtained through data mining, analyzing end-users’ behaviors, response times, 
responses, and dialogue moves. These data could be provided for the community 
at special pages and could be made available per article, possibly by enhancing 
the “Page information” system. One can also envision these kinds of analytics 
data existing at the granularity of portions of, or selections of, articles.



Best regards,

Adam


From: Victoria Coleman 
Sent: Saturday, February 4, 2023 8:10 AM
To: Wikimedia Mailing List 
Subject: [Wikimedia-l] Re: Chat GPT

Hi Christophe,

I had not thought about the threat to Wikipedia traffic from Chat GPT but you 
have a good point. The success of the projects is always one step away from the 
next big disruption. So the WMF as the tech provider for the mission (because 
first and foremost in my view that?s what the WMF is - as well as the financial 
engine of the movement of course) needs to pay attention and experiment to 
maintain the long term viability of the mission. In fact I think the cluster of 
our projects offers compelling options. For example to your point below on data 
sets, we have the amazing Wikidata as well the excellent work on abstract 
Wikipedia. We have Wikipedia Enterprise which has built some avenues of 
collaboration with big tech. A bold vision is needed to bring all of it 
together and build an MVP for the community to experiment with.

Best regards,

Victoria Coleman

On Feb 4, 2023, at 4:14 AM, Christophe Henner  
wrote:

?Hi,

On the product side, NLP based AI biggest concern to me is that it would 
drastically decrease traffic to our websites/apps. Which means less new editors 
ans less donations.

So first from a strictly positioning perspective, we have here a major change 
that needs to be managed.

And to be honest, it will come faster than we think. We are perfectionists, I 
can assure you, most companies would be happy to launch a search product with a 
80% confidence in answers quality.

>From a financial perspective, large industrial investment like this are 
>usually a pool of money you can draw from in x years. You can expect they did 
>not draw all of it yet.

Second, GPT 3 and ChatGPT are far from being the most expensive products they 
have. On top of people you need:
* datasets
* people to tag the dataset
* people to correct the algo
* computing power

I simplify here, but we already have the capacity to muster some of that, which 
drastically lowers our costs :)

I would not discard the option of the movement doing it so easily. That being 
said, it would mean a new project with the need of substantial ressources.

Sent from my iPhone

On Feb 4, 2023, at 9:30 AM, Adam Sobieski  wrote:

?
With respect to cloud computing costs, these being a significant component of 
the costs to train and operate modern AI systems, as a non-profit organization, 
the Wikimedia Foundation might be interested in the National Research Cloud 
(NRC) policy proposal: https://hai.stanford.edu/policy/national-research-cloud .

"Artificial intelligence requires vast amounts of computing power, data, and 
expertise to train and deploy the massive machine learning models behind the 
most advanced research. But access is increasingly out of reach for most 
colleges and universities. A National Research Cloud (NRC) would provide 
academic and non-profit researchers with the compute power and government 
datasets needed for education and 

[Wikimedia-l] Re: Block of Wikipedia in Pakistan

2023-02-04 Thread Lane Chance
Thank you for making a statement.

Could the WMF please publish the take down order (on-wiki)?

I understand that take down orders are not normally considered
confidential or private, though there may be reasons for you to
publish a redacted version. There are obvious benefits to volunteers
taking their own precautions to protect themselves and their projects
depending on the nature of what the PTA considers "sacrilegious
content" to be, and the news coverage indicates that the regulator has
so far not made this public.

Lane

Potential citations
1. Aljazeera: 
https://www.aljazeera.com/news/2023/2/4/pakistan-blocks-wikipedia-citing-blasphemous-content
2. WMF blog post:
https://wikimediafoundation.org/news/2023/02/03/wikimedia-foundation-urges-pakistan-telecommunications-authority-to-restore-access-to-wikipedia-in-pakistan
3. Pakistan Today:
https://www.pakistantoday.com.pk/2023/02/04/pta-ban-hammer-downs-wikipedia-over-blasphemous-content
4. Yahoo News: 
https://ph.news.yahoo.com/pakistan-degrades-wikipedia-warns-complete-162957457.html

On Sat, 4 Feb 2023 at 01:45, Stephen LaPorte  wrote:
>
> Hello everyone,
>
> Today, the Pakistan Telecommunication Authority (PTA) ordered that access to 
> Wikipedia be suspended in Pakistan. We are urging the Pakistan government to 
> restore access to Wikipedia and Wikimedia projects immediately.
>
> This action denies the fifth most populous nation in the world access to the 
> world's largest, free online knowledge repository. If it continues, it will 
> also deny the world the perspective of the people of Pakistan and the benefit 
> of their knowledge, history, and culture.
>
> The Wikimedia Foundation received a notification from the Pakistan 
> Telecommunication Authority on February 1, 2023, stating “the services of 
> Wikipedia have been degraded for 48 hours” for failure to remove content from 
> the site deemed “unlawful” by the government. The notification further 
> mentioned that a block of Wikipedia could follow, if the Foundation did not 
> comply with the takedown orders. As of Friday February 3, our internal 
> traffic reports indicate that Wikipedia and Wikimedia projects are no longer 
> accessible to users in Pakistan.
>
> The Wikimedia Foundation is already examining various avenues and 
> investigating how we can help restore access, while staying true to our 
> values of verifiability, neutrality, and freedom of information.
>
> We are also prepared to support any members of the Wikimedia communities who 
> are impacted. If you or someone you know is contacted by the Pakistani 
> government in reference to the block, please contact us at 
> le...@wikimedia.org. If you or someone you know is in immediate physical 
> danger in reference to the block, please contact emerge...@wikimedia.org 
> right away. We are actively working to reach out to community leaders in the 
> area.
>
> For over twenty years, our movement has supported knowledge as a fundamental 
> human right. In defense of this right, we have opposed a growing number of 
> threats that would interfere with the ability of people to access and 
> contribute to free knowledge. We know that many of you will want to take 
> action or speak out against the block. For now, please continue to do what is 
> needed to remain safe. We will keep you updated on any new developments, 
> actions we are taking, and ways which you can help return access to Wikipedia 
> and Wikimedia projects in Pakistan.
>
> Thank you,
> Stephen
>
> --
> Stephen LaPorte (he/him/his)
> General Counsel
> Wikimedia Foundation
>
> NOTICE: As an attorney for the Wikimedia Foundation, for legal and ethical 
> reasons, I cannot give legal advice to, or serve as a lawyer for, community 
> members, volunteers, or staff members in their personal capacity. For more on 
> what this means, please see our legal disclaimer.
>
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at 
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BBL2KGUMFHNVNHSA4UINFMLIVPO6GQB5/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/TVKSEJO7CBNE3OT3XRBPCOTY7IL2AQ5S/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Victoria Coleman
Hi Christophe,I had not thought about the threat to Wikipedia traffic from Chat GPT but you have a good point. The success of the projects is always one step away from the next big disruption. So the WMF as the tech provider for the mission (because first and foremost in my view that?s what the WMF is - as well as the financial engine of the movement of course) needs to pay attention and experiment to maintain the long term viability of the mission. In fact I think the cluster of our projects offers compelling options. For example to your point below on data sets, we have the amazing Wikidata as well the excellent work on abstract Wikipedia. We have Wikipedia Enterprise which has built some avenues of collaboration with big tech. A bold vision is needed to bring all of it together and build an MVP for the community to experiment with.Best r
 egards,Victoria ColemanOn Feb 4, 2023, at 4:14 AM, Christophe Henner  wrote:?Hi,On the product side, NLP based AI biggest concern to me is that it would drastically decrease traffic to our websites/apps. Which means less new editors ans less donations. So first from a strictly positioning perspective, we have here a major change that needs to be managed.And to be honest, it will come faster than we think. We are perfectionists, I can assure you, most companies would be happy to launch a search product with a 80% confidence in answers quality.From a financial perspective, large industrial investment like this are usually a pool of money you can
  draw from in x years. You can expect they did not draw all of it yet.Second, GPT 3 and ChatGPT are far from being the most expensive products they have. On top of people you need:* datasets * people to tag the dataset * people to correct the algo* computing powerI simplify here, but we already have the capacity to muster some of that, which drastically lowers our costs :) I would not discard the option of the movement doing it so easily. That being said, it would mean a new project with the need of substantial ressources. Sent from my iPhoneOn Feb 4, 2023, at 9:30 AM, Adam Sobieski  wrote:?






With respect to cloud computing costs, these being a significant component of the costs to train and operate modern AI systems, as a non-profit
 organization, the Wikimedia Foundation might be interested in the National Research Cloud (NRC) policy proposal: https://hai.stanford.edu/policy/national-research-cloud .






"Artificial intelligence requires vast amounts of computing power, data, and expertise to train and deploy the massive machine learning models behind the most advanced research. But access is increasingly out of reach for most
 colleges and universities. A National Research Cloud (NRC) would provide academic and
non-profit researchers with the compute power and government datasets needed for education and research. By democratizing access and equity for all colleges and universities,
 an NRC has the potential not only to unleash a string of advancements in AI, but to help ensure the U.S. maintains its leadership and competitiveness on the global stage.



"Throughout 2020, Stanford HAI led efforts with 22 top computer science universities along with a bipartisan, bicameral group of lawmakers proposing legislation to bring the NRC to fruition.
 On January 1, 2021, the U.S. Congress authorized the National AI Research Resource Task Force Act as part of the National Defense Authorization Act for Fiscal Year 2021. This law requires that a federal task force be established to study and provide an implementation
 pathway to create world-class computational resources and robust government datasets for researchers across the country in the form of a National Research Cloud. The task force will issue a final report to the President and Congress next year.





"The promise of an NRC is to democratize AI research, education, and innovation, making it accessible to all colleges and universities across the country. Without a National Research Cloud, all but the most elite universities
 risk losing the ability to conduct meaningful AI research and to adequately educate the next generation of AI researchers."




See also: [1][2]




[1] https://www.whitehouse.gov/ostp/news-updates/2023/01/24/national-artificial-intelligence-research-resource-task-force-releases-final-report/

[2] https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf




From: Steven Walling 
Sent: Saturday, February 4, 2023 1:59 AM
To: Wikimedia Mailing List 
Subject: [Wikimedia-l] Re: Chat GPT
 







On Fri, Feb 3, 2023 at 9:47 PM Gerg? Tisza  wrote:


Just to give a sense of scale: OpenAI started with a $1 billion donation, got another $1B as investment, and is now getting a larger investment from Microsoft (undisclosed but rumored to be $10B). Assuming they spent most of their previous funding,
 which seems likely, their operational costs are in the ballpark of $300 million per year. The idea that the WMF could just 

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Christophe Henner
Hi,On the product side, NLP based AI biggest concern to me is that it would drastically decrease traffic to our websites/apps. Which means less new editors ans less donations. So first from a strictly positioning perspective, we have here a major change that needs to be managed.And to be honest, it will come faster than we think. We are perfectionists, I can assure you, most companies would be happy to launch a search product with a 80% confidence in answers quality.From a financial perspective, large industrial investment like this are usually a pool of money you can draw from in x years. You can expect they did not draw all of it yet.Second, GPT 3 and ChatGPT are far from being the most expensive products they have. On top of people you need:*
  datasets * people to tag the dataset * people to correct the algo* computing powerI simplify here, but we already have the capacity to muster some of that, which drastically lowers our costs :) I would not discard the option of the movement doing it so easily. That being said, it would mean a new project with the need of substantial ressources. Sent from my iPhoneOn Feb 4, 2023, at 9:30 AM, Adam Sobieski  wrote:?






With respect to cloud computing costs, these being a significant component of the costs to train and operate modern AI systems, as a non-profit
 organization, the Wikimedia Foundation might be interested in the National Research Cloud (NRC) policy proposal: https://hai.stanford.edu/policy/national-research-cloud .






"Artificial intelligence requires vast amounts of computing power, data, and expertise to train and deploy the massive machine learning models behind the most advanced research. But access is increasingly out of reach for most
 colleges and universities. A National Research Cloud (NRC) would provide academic and
non-profit researchers with the compute power and government datasets needed for education and research. By democratizing access and equity for all colleges and universities,
 an NRC has the potential not only to unleash a string of advancements in AI, but to help ensure the U.S. maintains its leadership and competitiveness on the global stage.



"Throughout 2020, Stanford HAI led efforts with 22 top computer science universities along with a bipartisan, bicameral group of lawmakers proposing legislation to bring the NRC to fruition.
 On January 1, 2021, the U.S. Congress authorized the National AI Research Resource Task Force Act as part of the National Defense Authorization Act for Fiscal Year 2021. This law requires that a federal task force be established to study and provide an implementation
 pathway to create world-class computational resources and robust government datasets for researchers across the country in the form of a National Research Cloud. The task force will issue a final report to the President and Congress next year.





"The promise of an NRC is to democratize AI research, education, and innovation, making it accessible to all colleges and universities across the country. Without a National Research Cloud, all but the most elite universities
 risk losing the ability to conduct meaningful AI research and to adequately educate the next generation of AI researchers."




See also: [1][2]




[1] https://www.whitehouse.gov/ostp/news-updates/2023/01/24/national-artificial-intelligence-research-resource-task-force-releases-final-report/

[2] https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf




From: Steven Walling 
Sent: Saturday, February 4, 2023 1:59 AM
To: Wikimedia Mailing List 
Subject: [Wikimedia-l] Re: Chat GPT
 







On Fri, Feb 3, 2023 at 9:47 PM Gerg? Tisza  wrote:


Just to give a sense of scale: OpenAI started with a $1 billion donation, got another $1B as investment, and is now getting a larger investment from Microsoft (undisclosed but rumored to be $10B). Assuming they spent most of their previous funding,
 which seems likely, their operational costs are in the ballpark of $300 million per year. The idea that the WMF could just choose to create conversational software of a similar quality if it wanted seems detached from reality to me.



Without spending billions on LLM development to aim for a conversational chatbot trying to pass a Turing test, we could definitely try to catch up to the state of the art in search results. Our search currently does a pretty bad job (in terms of recall
 especially). Today's featured article in English is the Hot Chip album "Made in the Dark", and if I enter anything but the exact article title the typeahead results are woefully incomplete or wrong. If I ask an actual question, good luck. 


Google is feeling vulnerable to OpenAI here in part because everyone can see that their results are often full of low quality junk created for SEO, while ChatGPT just gives a concise answer right there. 


https://en.wikipedia.org/wiki/The_Menu_(2022_film) is one of the top viewed English articles. If I search "The Menu 

[Wikimedia-l] Re: Chat GPT

2023-02-04 Thread Adam Sobieski
With respect to cloud computing costs, these being a significant component of 
the costs to train and operate modern AI systems, as a non-profit organization, 
the Wikimedia Foundation might be interested in the National Research Cloud 
(NRC) policy proposal: https://hai.stanford.edu/policy/national-research-cloud .

"Artificial intelligence requires vast amounts of computing power, data, and 
expertise to train and deploy the massive machine learning models behind the 
most advanced research. But access is increasingly out of reach for most 
colleges and universities. A National Research Cloud (NRC) would provide 
academic and non-profit researchers with the compute power and government 
datasets needed for education and research. By democratizing access and equity 
for all colleges and universities, an NRC has the potential not only to unleash 
a string of advancements in AI, but to help ensure the U.S. maintains its 
leadership and competitiveness on the global stage.

"Throughout 2020, Stanford HAI led efforts with 22 top computer science 
universities along with a bipartisan, bicameral group of lawmakers proposing 
legislation to bring the NRC to fruition. On January 1, 2021, the U.S. Congress 
authorized the National AI Research Resource Task Force Act as part of the 
National Defense Authorization Act for Fiscal Year 2021. This law requires that 
a federal task force be established to study and provide an implementation 
pathway to create world-class computational resources and robust government 
datasets for researchers across the country in the form of a National Research 
Cloud. The task force will issue a final report to the President and Congress 
next year.

"The promise of an NRC is to democratize AI research, education, and 
innovation, making it accessible to all colleges and universities across the 
country. Without a National Research Cloud, all but the most elite universities 
risk losing the ability to conduct meaningful AI research and to adequately 
educate the next generation of AI researchers."

See also: [1][2]

[1] 
https://www.whitehouse.gov/ostp/news-updates/2023/01/24/national-artificial-intelligence-research-resource-task-force-releases-final-report/
[2] https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf


From: Steven Walling 
Sent: Saturday, February 4, 2023 1:59 AM
To: Wikimedia Mailing List 
Subject: [Wikimedia-l] Re: Chat GPT



On Fri, Feb 3, 2023 at 9:47 PM Gergő Tisza 
mailto:gti...@gmail.com>> wrote:
Just to give a sense of scale: OpenAI started with a $1 billion donation, got 
another $1B as investment, and is now getting a larger investment from 
Microsoft (undisclosed but rumored to be $10B). Assuming they spent most of 
their previous funding, which seems likely, their operational costs are in the 
ballpark of $300 million per year. The idea that the WMF could just choose to 
create conversational software of a similar quality if it wanted seems detached 
from reality to me.

Without spending billions on LLM development to aim for a conversational 
chatbot trying to pass a Turing test, we could definitely try to catch up to 
the state of the art in search results. Our search currently does a pretty bad 
job (in terms of recall especially). Today's featured article in English is the 
Hot Chip album "Made in the Dark", and if I enter anything but the exact 
article title the typeahead results are woefully incomplete or wrong. If I ask 
an actual question, good luck.

Google is feeling vulnerable to OpenAI here in part because everyone can see 
that their results are often full of low quality junk created for SEO, while 
ChatGPT just gives a concise answer right there.

https://en.wikipedia.org/wiki/The_Menu_(2022_film) is one of the top viewed 
English articles. If I search "The Menu reviews" the Google results are noisy 
and not so great. ChatGPT actually gives you nothing relevant because it 
doesn't know anything from 2022. If we could just manage to display the three 
sentence snippet of our article about the critical response section of the 
article, it would be awesome. It's too bad that the whole "knowledge engine" 
debacle poisoned the well when it comes to a Wikipedia search engine, because 
we could definitely do a lot to learn from what people like about ChatGPT and 
apply to Wikipedia search.

___
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/6OBPB7WNHKJQXXIBCK73SDXLE3DMGNMY/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list --