[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-26 Thread Kimmo Virtanen
>
> No speech interface as far as I can tell, but FYI, there now is at


Emerson by Quickchat has a speech interface integration to GPT models.
- https://www.quickchat.ai/emerson



Some sample videos by Dr Alan D. Thompson.

Leta, GPT-3 AI - Episode 1 (Five things, Art, Seeing, Round) -
Conversations and talking with GPT3
- https://www.youtube.com/watch?v=5DBXZRZEBGM

Leta, GPT-3 AI - Episode 10 (GPT-J, GPT-3, GPT-2 questions, facts, general
knowledge)
- https://www.youtube.com/watch?v=V0pceNYgELE

Br,
-- Kimmo Virtanen, Zache

On Sun, Dec 25, 2022 at 11:22 PM Erik Moeller  wrote:

> On Sun, Dec 25, 2022 at 1:00 AM Anders Wennersten
>  wrote:
> > For me the only question is of Google come first (who has better
> knowledge how to interface
> > backend knowledge repositories the Wikipedia will become) or if chatGPT
> will learn this
>
> No speech interface as far as I can tell, but FYI, there now is at
> least one search engine that already integrates a language model based
> chatbot into search: https://you.com/, which has the backing from
> Salesforce founder & billionaire Marc Benioff (a bit more:
> https://www.protocol.com/you-dot-com-benioff). Unlike ChatGPT, it
> tries to directly cite web sources. When that source is Wikipedia,
> you'll note it's basically rewriting/summarizing the Wikipedia
> article. I don't know if it uses GPT underneath or its own language
> model; Salesforce has certainly funded the creation of models of its
> own.
>
> When I asked You.com if it uses GPT-3, it said yes. When I asked it to
> provide a source, it generated a URL that does not exist.
>
> I also observed other failure modes, such as combining multiple
> persons with the same name into one, or giving directly contradictory
> answers to the same question being asked repeatedly. All of these
> failure modes are characteristic of language models, which are a bit
> like pinball machines in that they will generate results
> nondeterministically from the training data.
>
> Of course, this is the technology as it exists today, and even with
> those limitations in mind it can prove useful (though it seems
> irresponsible to market it as part of a search engine in its current
> form).
>
> Warmly,
> Erik
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/FAGXGQAMO6DHAANORKLHEY6DKHEL33UW/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/PKPCIWAOMC6NPFKVAPSMX64F5S5BR7WH/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-25 Thread Erik Moeller
On Sun, Dec 25, 2022 at 1:00 AM Anders Wennersten
 wrote:
> For me the only question is of Google come first (who has better knowledge 
> how to interface
> backend knowledge repositories the Wikipedia will become) or if chatGPT will 
> learn this

No speech interface as far as I can tell, but FYI, there now is at
least one search engine that already integrates a language model based
chatbot into search: https://you.com/, which has the backing from
Salesforce founder & billionaire Marc Benioff (a bit more:
https://www.protocol.com/you-dot-com-benioff). Unlike ChatGPT, it
tries to directly cite web sources. When that source is Wikipedia,
you'll note it's basically rewriting/summarizing the Wikipedia
article. I don't know if it uses GPT underneath or its own language
model; Salesforce has certainly funded the creation of models of its
own.

When I asked You.com if it uses GPT-3, it said yes. When I asked it to
provide a source, it generated a URL that does not exist.

I also observed other failure modes, such as combining multiple
persons with the same name into one, or giving directly contradictory
answers to the same question being asked repeatedly. All of these
failure modes are characteristic of language models, which are a bit
like pinball machines in that they will generate results
nondeterministically from the training data.

Of course, this is the technology as it exists today, and even with
those limitations in mind it can prove useful (though it seems
irresponsible to market it as part of a search engine in its current
form).

Warmly,
Erik
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/FAGXGQAMO6DHAANORKLHEY6DKHEL33UW/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org


[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-25 Thread Anders Wennersten
I have also played around with simpler queries (see below) and find the 
same: excellent, easy to read answers, with minor mistakes and sometimes 
false, invented info.


But I still see this as the future. If/when this is implemented with 
speech recognition (already exists) and good synthetic speech, you have 
a new interface for users that will take on the world. For me the only 
question is of Google come first (who has better knowledge how to 
interface backend knowledge repositories the Wikipedia will become) or 
if chatGPT will learn this, that seems for me to be a minor thing for 
them compared with getting the answering logic in place that they have 
mastered.


And I still believe is is vital for us to understand how this future 
will affect or systems as well as how we will need to work in order to 
still provide knowledge, even if it is to a repository and not the world 
directly.


My test "Where lies Farsta?"

"Farsta is a suburb located in the southeastern part of Stockholm, the 
capital city of Sweden. It is situated in the Söderort borough of 
Stockholm and is known for its shopping center and recreational areas. 
Farsta is easily accessible by public transport, with several bus and 
metro lines passing through the area. The suburb is also home to Farsta 
Strand, a beach and recreational area that attracts visitors during the 
summer months."


*that söderort actually is not a borough is not of vital importance to 
the reader.


*the last sentence and to call it recreational areas is just wrong. 
Farsta strand is called a satellite town i enwp and is primary a built 
up area located close by a smaller lake. There exist a small beach 
"farsta strandbad" for local residents, but there are 5-6 better beaches 
at cleaner lakes within a 5 km radius. And not a recreation area and 
certainly not attracting visitors during the summer months from outside 
this place (there are many excellent recreational areas very close by)


Anders

Den 2022-12-23 kl. 16:29, skrev Chris Keating:



On Fri, Dec 23, 2022 at 1:04 PM Paulo Santos Perneta 
 wrote:


I've been playing with it on over the last week, and my experience
is that that AI is incredibly prone to invent informations that
look credible, while being entirely made up by it. It would
generate inexistent persons, dates, events and places, and build
an entire story out of them that, nevertheless, looks credible.


Indeed; here is another example, after prompting to write an article 
about the song "Tomorrow Belongs to Me", which was written in 1966 for 
the Broadway musical "Cabaret". ChatGPT attributes it, entirely 
fictionally, to a real 1920s German lyricist.


"Tomorrow Belongs to Me" is a song that has been used in a variety of 
media, including film, television, and theater. The song is known for 
its nationalist and anti-Semitic lyrics, which have made it 
controversial and a subject of debate.


The song was originally written by the German composer and lyricist 
Hans Leip in the 1920s. It was later adopted by the National Socialist 
German Workers' Party (Nazi Party) as a propaganda song, and was 
performed at party rallies and events.


In the United States, "Tomorrow Belongs to Me" was popularized by its 
use in the film "Cabaret," which was released in 1972. In the film, 
the song is sung by a young man at a Nazi rally, and is later 
performed by the character of Sally Bowles, played by Liza Minnelli. 
The song has since been used in a number of other films, television 
shows, and stage productions, often as a means of referencing the Nazi 
party or the Holocaust.


Despite its widespread use, "Tomorrow Belongs to Me" has been 
criticized for its nationalist and anti-Semitic themes, and some have 
called for it to be banned or censored. The song has also been the 
subject of legal disputes, with some arguing that its use in certain 
contexts may infringe on copyright laws or be considered hate speech.


___
Wikimedia-l mailing list --wikimedia-l@lists.wikimedia.org, guidelines 
at:https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines  
andhttps://meta.wikimedia.org/wiki/Wikimedia-l
Public archives 
athttps://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/DB56GXXYANLKW3SDGKVBID7EHGXC7JHD/
To unsubscribe send an email towikimedia-l-le...@lists.wikimedia.org___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/7AN3EMTK3VL7OFJ5UVH4QH25NXM2O2UR/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-23 Thread Erik Moeller
On Sun, Dec 18, 2022 at 2:01 PM Erik Moeller  wrote:

> Querying Wikidata via SPARQL is currently still a bit of wizardry (and
> the query builder is extremely limited). To pick a completely random
> example not at all inspired by current events, if I wanted to see a
> list of journalists with Mastodon accounts & a picture, I currently
> have to do this:

In case folks find it useful, that same query is now available (along
with other profession-based Wikidata searches) here:
https://eloquence.github.io/fediscope/

To bring it back on-topic, much of the coding and data processing was
aided by ChatGPT ;)

Warmly,
Erik
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/LNYTDKNDT3SCO6CJTWTFP7EBTGXIHF24/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org


[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-23 Thread Chris Keating
On Fri, Dec 23, 2022 at 1:04 PM Paulo Santos Perneta <
paulospern...@gmail.com> wrote:

> I've been playing with it on over the last week, and my experience is that
> that AI is incredibly prone to invent informations that look credible,
> while being entirely made up by it. It would generate inexistent persons,
> dates, events and places, and build an entire story out of them that,
> nevertheless, looks credible.
>

Indeed; here is another example, after prompting to write an article about
the song "Tomorrow Belongs to Me", which was written in 1966 for the
Broadway musical "Cabaret". ChatGPT attributes it, entirely fictionally, to
a real 1920s German lyricist.

"Tomorrow Belongs to Me" is a song that has been used in a variety of
media, including film, television, and theater. The song is known for its
nationalist and anti-Semitic lyrics, which have made it controversial and a
subject of debate.

The song was originally written by the German composer and lyricist Hans
Leip in the 1920s. It was later adopted by the National Socialist German
Workers' Party (Nazi Party) as a propaganda song, and was performed at
party rallies and events.

In the United States, "Tomorrow Belongs to Me" was popularized by its use
in the film "Cabaret," which was released in 1972. In the film, the song is
sung by a young man at a Nazi rally, and is later performed by the
character of Sally Bowles, played by Liza Minnelli. The song has since been
used in a number of other films, television shows, and stage productions,
often as a means of referencing the Nazi party or the Holocaust.
Despite its widespread use, "Tomorrow Belongs to Me" has been criticized
for its nationalist and anti-Semitic themes, and some have called for it to
be banned or censored. The song has also been the subject of legal
disputes, with some arguing that its use in certain contexts may infringe
on copyright laws or be considered hate speech.
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/DB56GXXYANLKW3SDGKVBID7EHGXC7JHD/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-23 Thread Kimmo Virtanen
Hi,

I am not sure if the generated article text is how it will be used as long
it will invent stuff from thin air. However, it is rather good for
summarizing text and converting source text to a structured format. For
example, converting Wikimedia Commons categories to wikidata properties.

Source file for the categories
*
https://commons.wikimedia.org/wiki/File:1150D_-_01,_Acervo_do_Museu_Paulista_da_USP.jpg

*Kimmo*

> Describe "19th-century portrait photographs at bust length", "19th-century
> portrait photographs of men", "Albumen prints", "Photographia Americana",
> "19th-century photographs of Brazil", "Works by Militão A. de Azevedo at
> Museu Paulista". Use only a list of Wikidata properties and identifiers.
> Context is photography.


*ChatGPT*
> Here is a list of Wikidata properties and identifiers that could be used
> to describe the following topics:
> "19th-century portrait photographs at bust length":
>
>- "point in time" (P585) - 19th century
>
>
>- "genre" (P136) - portrait photograph
>
>
>- "depicts" (P180) - bust
>
>
>- "medium" (P186) - photograph
>
> "19th-century portrait photographs of men":
>
>- "point in time" (P585) - 19th century
>
>
>- "genre" (P136) - portrait photograph
>
>
>- "depicts" (P180) - men
>
>
>- "medium" (P186) - photograph
>
> "Albumen prints":
>
>- "medium" (P186) - albumen print
>
> "Photographia Americana":
>
>- "genre" (P136) - photography
>
>
>- "country" (P17) - United States of America
>
> "19th-century photographs of Brazil":
>
>- "point in time" (P585) - 19th century
>
>
>- "genre" (P136) - photograph
>
>
>- "country" (P17) - Brazil
>
> "Works by Militão A. de Azevedo at Museu Paulista":
>
>- "artist" (P170) - Militão A. de Azevedo
>
>
>- "location" (P276) - Museu Paulista
>
>
It fails with  Photographia Americana which is photographic studio in São
Paulo.

However in other ways the result was quite a good proxy for what the photo
contains.  Information could be used as filling the wikidata items of the
commons categories, suggested values for the Structured data on commons
values, multilingual search etc.

Br,
-- Kimmo Virtanen, Zache

On Thu, Dec 22, 2022 at 8:06 PM Ziko van Dijk  wrote:

> Hello,
> About my experiences with Chat CPT - is it good enough to write our
> articles? - I made this video:
> https://youtu.be/zKPEyxYt5kg
> Kind regards
> Ziko
>
> Am Di., 20. Dez. 2022 um 20:25 Uhr schrieb Pharos
> :
> >
> > I'd like to share my experiment with using ChapGPT to help write an
> article on English Wikipedia:
> >
> > https://en.wikipedia.org/wiki/Artwork_title
> >
> > You can see an explanation of the process here, your comments are
> welcome:
> >
> > https://en.wikipedia.org/wiki/Talk:Artwork_title#Use_of_ChatGPT
> >
> > Thanks,
> > Richard
> > (User:Pharos)
> >
> > On Tue, Dec 20, 2022 at 1:45 PM ZhaoFJx  wrote:
> >>
> >> I'd be curious about copyright issues though, as it's licensed on
> Github. It may indeed be infringing copyright, since he may require
> attribution. However, I agree with The Cunctator that the community won't
> do anything about it.
> >>
> >> Incidentally, I'm a little worried that he might grab broken content
> that wasn't rolled back. After all - the current artificial intelligence
> can not reach the level of human thinking
> >>
> >> Sincerely,
> >> ZhaoFJx
> >> ___
> >> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,
> guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> and https://meta.wikimedia.org/wiki/Wikimedia-l
> >>
> >> Anders Wennersten  于2022年12月19日周一 08:52写道:
> >>>
> >>> I think we should not underestimate what this could evolve into. We
> thrived because our readers find us "good enough" when it comes to finding
> facts, not the ultimate source.
> >>>
> >>> And the software learns by reading, and can (and have done so)
> Wikipedia, Wikidata etc and represent our data in its own syntax och
> present it in a way so it is not a direct copy. Perhaps data will be a bit
> delayed to the actual content in Wikipedia, but so what - good enough?
> >>>
> >>> Anders
> >>>
> >>> Den 2022-12-19 kl. 14:26, skrev Gnangarra:
> >>>
> >>> AI simply cant descriminate between good research and faked research,
> for any outcome it must provide all of its sources whether they are from
> Wikipedia, Wikidata, WikiCommons, WikiSource or some other place.
> Otherwise it will answer yes to some asking if the world is flat because
> it'll seek out that answer and find all the nonsense that has been produced.
> >>>
> >>> On Mon, 19 Dec 2022 at 06:02, Erik Moeller 
> wrote:
> 
>  On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten
>   wrote:
>  > ChatGPT is now making headlines more or less every day  and I
> perceive
>  > them to try to position themself  av the "next" google.
> 
>  I suspect OpenAI will continue to focus on generative applications
>  (images, code, text 

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-23 Thread Paulo Santos Perneta
I've been playing with it on over the last week, and my experience is that
that AI is incredibly prone to invent informations that look credible,
while being entirely made up by it. It would generate inexistent persons,
dates, events and places, and build an entire story out of them that,
nevertheless, looks credible. I would advise anyone against using it for
anything more than helping with paraphrasing already existing texts, when
using it.

Just now. when asked about bishop Pedro Sardinha (first bishop of Brazil)
he told me:
It's possible that you are referring to Pedro da Silva Sardinha, also known
as Father Francisco, who was a Portuguese Jesuit missionary who served as
the bishop of Timbuktu in the early 18th century. Father Francisco was born
in Lisbon, Portugal, and entered the Jesuit order as a young man. He was
sent to West Africa as a missionary and served as the bishop of Timbuktu
from 1705 to 1706. Father Francisco is known for his efforts to convert the
local population to Christianity and for his work to improve the education
and literacy of the people in Timbuktu. However, he died in 1706,
apparently while being held captive by local authorities.

This person never existed, nor those events, it was entirely made up by
the AI, but looks credible. Often it says the information is based on its
research on reliable sources, even presenting some of them (but also fake).

Yesterday, when asked about the Quintinense Scheme, a well known sockpuppet
affair of the Portuguese Wikipedia, it said it was an arrangement by King
Manuel I of Portugal in the early 16th century, concerning the way the
lands should be cultivated in the colonies, named after Quinta do Lorde,
which it said was the first settlement in Madeira (it's actually a tourist
resort built in the late 1990s). I'm constantly surprised by the AI
capacity of inventing such elaborate fakes.

In the case of Pharos experiment, this AI generated information, which
still persists (unsourced) in the article, is probably fake as well: The
practice of titling artworks dates back to ancient civilizations, where
works of art were often inscribed with the name of the artist and the
subject of the piece.

Best,

Paulo

Ziko van Dijk  escreveu no dia quinta, 22/12/2022 à(s)
18:01:

> Hello,
> About my experiences with Chat CPT - is it good enough to write our
> articles? - I made this video:
> https://youtu.be/zKPEyxYt5kg
> Kind regards
> Ziko
>
> Am Di., 20. Dez. 2022 um 20:25 Uhr schrieb Pharos
> :
> >
> > I'd like to share my experiment with using ChapGPT to help write an
> article on English Wikipedia:
> >
> > https://en.wikipedia.org/wiki/Artwork_title
> >
> > You can see an explanation of the process here, your comments are
> welcome:
> >
> > https://en.wikipedia.org/wiki/Talk:Artwork_title#Use_of_ChatGPT
> >
> > Thanks,
> > Richard
> > (User:Pharos)
> >
> > On Tue, Dec 20, 2022 at 1:45 PM ZhaoFJx  wrote:
> >>
> >> I'd be curious about copyright issues though, as it's licensed on
> Github. It may indeed be infringing copyright, since he may require
> attribution. However, I agree with The Cunctator that the community won't
> do anything about it.
> >>
> >> Incidentally, I'm a little worried that he might grab broken content
> that wasn't rolled back. After all - the current artificial intelligence
> can not reach the level of human thinking
> >>
> >> Sincerely,
> >> ZhaoFJx
> >> ___
> >> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,
> guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> and https://meta.wikimedia.org/wiki/Wikimedia-l
> >>
> >> Anders Wennersten  于2022年12月19日周一 08:52写道:
> >>>
> >>> I think we should not underestimate what this could evolve into. We
> thrived because our readers find us "good enough" when it comes to finding
> facts, not the ultimate source.
> >>>
> >>> And the software learns by reading, and can (and have done so)
> Wikipedia, Wikidata etc and represent our data in its own syntax och
> present it in a way so it is not a direct copy. Perhaps data will be a bit
> delayed to the actual content in Wikipedia, but so what - good enough?
> >>>
> >>> Anders
> >>>
> >>> Den 2022-12-19 kl. 14:26, skrev Gnangarra:
> >>>
> >>> AI simply cant descriminate between good research and faked research,
> for any outcome it must provide all of its sources whether they are from
> Wikipedia, Wikidata, WikiCommons, WikiSource or some other place.
> Otherwise it will answer yes to some asking if the world is flat because
> it'll seek out that answer and find all the nonsense that has been produced.
> >>>
> >>> On Mon, 19 Dec 2022 at 06:02, Erik Moeller 
> wrote:
> 
>  On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten
>   wrote:
>  > ChatGPT is now making headlines more or less every day  and I
> perceive
>  > them to try to position themself  av the "next" google.
> 
>  I suspect OpenAI will continue to focus on generative 

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-22 Thread Ziko van Dijk
Hello,
About my experiences with Chat CPT - is it good enough to write our
articles? - I made this video:
https://youtu.be/zKPEyxYt5kg
Kind regards
Ziko

Am Di., 20. Dez. 2022 um 20:25 Uhr schrieb Pharos
:
>
> I'd like to share my experiment with using ChapGPT to help write an article 
> on English Wikipedia:
>
> https://en.wikipedia.org/wiki/Artwork_title
>
> You can see an explanation of the process here, your comments are welcome:
>
> https://en.wikipedia.org/wiki/Talk:Artwork_title#Use_of_ChatGPT
>
> Thanks,
> Richard
> (User:Pharos)
>
> On Tue, Dec 20, 2022 at 1:45 PM ZhaoFJx  wrote:
>>
>> I'd be curious about copyright issues though, as it's licensed on Github. It 
>> may indeed be infringing copyright, since he may require attribution. 
>> However, I agree with The Cunctator that the community won't do anything 
>> about it.
>>
>> Incidentally, I'm a little worried that he might grab broken content that 
>> wasn't rolled back. After all - the current artificial intelligence can not 
>> reach the level of human thinking
>>
>> Sincerely,
>> ZhaoFJx
>> ___
>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>
>> Anders Wennersten  于2022年12月19日周一 08:52写道:
>>>
>>> I think we should not underestimate what this could evolve into. We thrived 
>>> because our readers find us "good enough" when it comes to finding facts, 
>>> not the ultimate source.
>>>
>>> And the software learns by reading, and can (and have done so) Wikipedia, 
>>> Wikidata etc and represent our data in its own syntax och present it in a 
>>> way so it is not a direct copy. Perhaps data will be a bit delayed to the 
>>> actual content in Wikipedia, but so what - good enough?
>>>
>>> Anders
>>>
>>> Den 2022-12-19 kl. 14:26, skrev Gnangarra:
>>>
>>> AI simply cant descriminate between good research and faked research, for 
>>> any outcome it must provide all of its sources whether they are from 
>>> Wikipedia, Wikidata, WikiCommons, WikiSource or some other place.  
>>> Otherwise it will answer yes to some asking if the world is flat because 
>>> it'll seek out that answer and find all the nonsense that has been produced.
>>>
>>> On Mon, 19 Dec 2022 at 06:02, Erik Moeller  wrote:

 On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten
  wrote:
 > ChatGPT is now making headlines more or less every day  and I perceive
 > them to try to position themself  av the "next" google.

 I suspect OpenAI will continue to focus on generative applications
 (images, code, text for purposes such as copywriting, eventually
 music/video) and won't attempt to compete with Google directly, but
 we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone
 to generating nonsensical answers, citations to works that don't
 exist, etc. But it is pretty cool if you keep its limitations in
 mind--for example, it's quite good at bootstrapping small scripts in
 various programming languages (with mistakes and idiosyncrasies).

 Google has one of the largest AI research programs on the planet, they
 just are extremely conservative about letting anyone try their models
 (due to reputational concerns, e.g., that generative AI will spit out
 racist output within about 30 seconds of people poking its
 guardrails). This blog post from September is instructive about the
 direction they're taking with what's called retrieval-augmented
 generation; see the paper linked from the post for details:

 https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind
 is part of Google)

 That is likely to yield significantly more accurate answers than what
 ChatGPT is doing, and is difficult to replicate for folks like OpenAI
 without being dependent on the search APIs of big search companies.
 It's worth noting that Google has also started to incorporate language
 model tooling into how it's presenting search results (e.g.,
 summarizing or highlighting different parts of a website to make the
 result snippet more useful).

 A retrieval-augmented approach that leverages Wikidata could IMO be
 quite powerful and could be a useful research program for Wikimedia to
 pursue, be it independently or in partnership with others. The
 resulting technology should of course be fully open source.

 Querying Wikidata via SPARQL is currently still a bit of wizardry (and
 the query builder is extremely limited). To pick a completely random
 example not at all inspired by current events, if I wanted to see a
 list of journalists with Mastodon accounts & a picture, I currently
 have to do this:

 SELECT DISTINCT ?personLabel ?mastodonName ?pic
 WHERE {
   ?person wdt:P4033 ?mastodonName ;
 wdt:P106 

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-20 Thread Pharos
I'd like to share my experiment with using ChapGPT to help write an article
on English Wikipedia:

https://en.wikipedia.org/wiki/Artwork_title

You can see an explanation of the process here, your comments are welcome:

https://en.wikipedia.org/wiki/Talk:Artwork_title#Use_of_ChatGPT

Thanks,
Richard
(User:Pharos)

On Tue, Dec 20, 2022 at 1:45 PM ZhaoFJx  wrote:

> I'd be curious about copyright issues though, as it's licensed on Github
> . It may indeed
> be infringing copyright, since he may require attribution. However, I agree
> with The Cunctator that the community won't do anything about it.
>
> Incidentally, I'm a little worried that he might grab broken content that
> wasn't rolled back. After all - the current artificial intelligence can not
> reach the level of human thinking
>
> Sincerely,
> ZhaoFJx
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
>
> Anders Wennersten  于2022年12月19日周一 08:52写道:
>
>> I think we should not underestimate what this could evolve into. We
>> thrived because our readers find us "good enough" when it comes to finding
>> facts, not the ultimate source.
>>
>> And the software learns by reading, and can (and have done so) Wikipedia,
>> Wikidata etc and represent our data in its own syntax och present it in a
>> way so it is not a direct copy. Perhaps data will be a bit delayed to the
>> actual content in Wikipedia, but so what - good enough?
>>
>> Anders
>>
>> Den 2022-12-19 kl. 14:26, skrev Gnangarra:
>>
>> AI simply cant descriminate between good research and faked research, for
>> any outcome it must provide all of its sources whether they are from
>> Wikipedia, Wikidata, WikiCommons, WikiSource or some other place.
>> Otherwise it will answer yes to some asking if the world is flat because
>> it'll seek out that answer and find all the nonsense that has been produced.
>>
>> On Mon, 19 Dec 2022 at 06:02, Erik Moeller  wrote:
>>
>>> On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten
>>>  wrote:
>>> > ChatGPT is now making headlines more or less every day  and I perceive
>>> > them to try to position themself  av the "next" google.
>>>
>>> I suspect OpenAI will continue to focus on generative applications
>>> (images, code, text for purposes such as copywriting, eventually
>>> music/video) and won't attempt to compete with Google directly, but
>>> we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone
>>> to generating nonsensical answers, citations to works that don't
>>> exist, etc. But it is pretty cool if you keep its limitations in
>>> mind--for example, it's quite good at bootstrapping small scripts in
>>> various programming languages (with mistakes and idiosyncrasies).
>>>
>>> Google has one of the largest AI research programs on the planet, they
>>> just are extremely conservative about letting anyone try their models
>>> (due to reputational concerns, e.g., that generative AI will spit out
>>> racist output within about 30 seconds of people poking its
>>> guardrails). This blog post from September is instructive about the
>>> direction they're taking with what's called retrieval-augmented
>>> generation; see the paper linked from the post for details:
>>>
>>> https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind
>>> is part of Google)
>>>
>>> That is likely to yield significantly more accurate answers than what
>>> ChatGPT is doing, and is difficult to replicate for folks like OpenAI
>>> without being dependent on the search APIs of big search companies.
>>> It's worth noting that Google has also started to incorporate language
>>> model tooling into how it's presenting search results (e.g.,
>>> summarizing or highlighting different parts of a website to make the
>>> result snippet more useful).
>>>
>>> A retrieval-augmented approach that leverages Wikidata could IMO be
>>> quite powerful and could be a useful research program for Wikimedia to
>>> pursue, be it independently or in partnership with others. The
>>> resulting technology should of course be fully open source.
>>>
>>> Querying Wikidata via SPARQL is currently still a bit of wizardry (and
>>> the query builder is extremely limited). To pick a completely random
>>> example not at all inspired by current events, if I wanted to see a
>>> list of journalists with Mastodon accounts & a picture, I currently
>>> have to do this:
>>>
>>> SELECT DISTINCT ?personLabel ?mastodonName ?pic
>>> WHERE {
>>>   ?person wdt:P4033 ?mastodonName ;
>>> wdt:P106 ?occupation .
>>>   OPTIONAL { ?person wdt:P18 ?pic . }
>>>   ?occupation wdt:P279* wd:Q1930187 .
>>>SERVICE wikibase:label {
>>>  bd:serviceParam wikibase:language "en"
>>>}
>>> }
>>>
>>> Make a small mistake (a curly brace missing) and you'll get a red
>>> error message. Forgot 

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-20 Thread ZhaoFJx
I'd be curious about copyright issues though, as it's licensed on Github
. It may indeed be
infringing copyright, since he may require attribution. However, I agree
with The Cunctator that the community won't do anything about it.

Incidentally, I'm a little worried that he might grab broken content that
wasn't rolled back. After all - the current artificial intelligence can not
reach the level of human thinking

Sincerely,
ZhaoFJx
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l

Anders Wennersten  于2022年12月19日周一 08:52写道:

> I think we should not underestimate what this could evolve into. We
> thrived because our readers find us "good enough" when it comes to finding
> facts, not the ultimate source.
>
> And the software learns by reading, and can (and have done so) Wikipedia,
> Wikidata etc and represent our data in its own syntax och present it in a
> way so it is not a direct copy. Perhaps data will be a bit delayed to the
> actual content in Wikipedia, but so what - good enough?
>
> Anders
>
> Den 2022-12-19 kl. 14:26, skrev Gnangarra:
>
> AI simply cant descriminate between good research and faked research, for
> any outcome it must provide all of its sources whether they are from
> Wikipedia, Wikidata, WikiCommons, WikiSource or some other place.
> Otherwise it will answer yes to some asking if the world is flat because
> it'll seek out that answer and find all the nonsense that has been produced.
>
> On Mon, 19 Dec 2022 at 06:02, Erik Moeller  wrote:
>
>> On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten
>>  wrote:
>> > ChatGPT is now making headlines more or less every day  and I perceive
>> > them to try to position themself  av the "next" google.
>>
>> I suspect OpenAI will continue to focus on generative applications
>> (images, code, text for purposes such as copywriting, eventually
>> music/video) and won't attempt to compete with Google directly, but
>> we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone
>> to generating nonsensical answers, citations to works that don't
>> exist, etc. But it is pretty cool if you keep its limitations in
>> mind--for example, it's quite good at bootstrapping small scripts in
>> various programming languages (with mistakes and idiosyncrasies).
>>
>> Google has one of the largest AI research programs on the planet, they
>> just are extremely conservative about letting anyone try their models
>> (due to reputational concerns, e.g., that generative AI will spit out
>> racist output within about 30 seconds of people poking its
>> guardrails). This blog post from September is instructive about the
>> direction they're taking with what's called retrieval-augmented
>> generation; see the paper linked from the post for details:
>>
>> https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind
>> is part of Google)
>>
>> That is likely to yield significantly more accurate answers than what
>> ChatGPT is doing, and is difficult to replicate for folks like OpenAI
>> without being dependent on the search APIs of big search companies.
>> It's worth noting that Google has also started to incorporate language
>> model tooling into how it's presenting search results (e.g.,
>> summarizing or highlighting different parts of a website to make the
>> result snippet more useful).
>>
>> A retrieval-augmented approach that leverages Wikidata could IMO be
>> quite powerful and could be a useful research program for Wikimedia to
>> pursue, be it independently or in partnership with others. The
>> resulting technology should of course be fully open source.
>>
>> Querying Wikidata via SPARQL is currently still a bit of wizardry (and
>> the query builder is extremely limited). To pick a completely random
>> example not at all inspired by current events, if I wanted to see a
>> list of journalists with Mastodon accounts & a picture, I currently
>> have to do this:
>>
>> SELECT DISTINCT ?personLabel ?mastodonName ?pic
>> WHERE {
>>   ?person wdt:P4033 ?mastodonName ;
>> wdt:P106 ?occupation .
>>   OPTIONAL { ?person wdt:P18 ?pic . }
>>   ?occupation wdt:P279* wd:Q1930187 .
>>SERVICE wikibase:label {
>>  bd:serviceParam wikibase:language "en"
>>}
>> }
>>
>> Make a small mistake (a curly brace missing) and you'll get a red
>> error message. Forgot the * after wdt:P279? A different response set
>> in ways that are difficult to spot or reason about.
>>
>> Why can't I type "list of journalists with their picture and Mastodon
>> account" as a natural language query? (You can try it in ChatGPT and
>> it'll get you started, but it'll generate nonsense P/Q numbers.) If
>> such queries could be produced reliably, it could be a very useful
>> tool for readers as well.
>>
>> Warmly,
>> Erik
>> 

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-19 Thread Anders Wennersten
I think we should not underestimate what this could evolve into. We 
thrived because our readers find us "good enough" when it comes to 
finding facts, not the ultimate source.


And the software learns by reading, and can (and have done so) 
Wikipedia, Wikidata etc and represent our data in its own syntax och 
present it in a way so it is not a direct copy. Perhaps data will be a 
bit delayed to the actual content in Wikipedia, but so what - good enough?


Anders

Den 2022-12-19 kl. 14:26, skrev Gnangarra:

AI simply cant descriminate between good research and faked research, 
for any outcome it must provide all of its sources whether they are 
from Wikipedia, Wikidata, WikiCommons, WikiSource or some other 
place.  Otherwise it will answer yes to some asking if the world is 
flat because it'll seek out that answer and find all the nonsense that 
has been produced.


On Mon, 19 Dec 2022 at 06:02, Erik Moeller  wrote:

On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten
 wrote:
> ChatGPT is now making headlines more or less every day and I
perceive
> them to try to position themself  av the "next" google.

I suspect OpenAI will continue to focus on generative applications
(images, code, text for purposes such as copywriting, eventually
music/video) and won't attempt to compete with Google directly, but
we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone
to generating nonsensical answers, citations to works that don't
exist, etc. But it is pretty cool if you keep its limitations in
mind--for example, it's quite good at bootstrapping small scripts in
various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they
just are extremely conservative about letting anyone try their models
(due to reputational concerns, e.g., that generative AI will spit out
racist output within about 30 seconds of people poking its
guardrails). This blog post from September is instructive about the
direction they're taking with what's called retrieval-augmented
generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind
is part of Google)

That is likely to yield significantly more accurate answers than what
ChatGPT is doing, and is difficult to replicate for folks like OpenAI
without being dependent on the search APIs of big search companies.
It's worth noting that Google has also started to incorporate language
model tooling into how it's presenting search results (e.g.,
summarizing or highlighting different parts of a website to make the
result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be
quite powerful and could be a useful research program for Wikimedia to
pursue, be it independently or in partnership with others. The
resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and
the query builder is extremely limited). To pick a completely random
example not at all inspired by current events, if I wanted to see a
list of journalists with Mastodon accounts & a picture, I currently
have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic
WHERE {
  ?person wdt:P4033 ?mastodonName ;
    wdt:P106 ?occupation .
  OPTIONAL { ?person wdt:P18 ?pic . }
  ?occupation wdt:P279* wd:Q1930187 .
   SERVICE wikibase:label {
     bd:serviceParam wikibase:language "en"
   }
}

Make a small mistake (a curly brace missing) and you'll get a red
error message. Forgot the * after wdt:P279? A different response set
in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon
account" as a natural language query? (You can try it in ChatGPT and
it'll get you started, but it'll generate nonsense P/Q numbers.) If
such queries could be produced reliably, it could be a very useful
tool for readers as well.

Warmly,
Erik
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,
guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at

https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/YYTLJVCDSYITUKNA2DJSK5SSR3AZ3B5F/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org



--
Boodarwun
Gnangarra
'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar 
koortaboodjar'

//

___
Wikimedia-l mailing list --wikimedia-l@lists.wikimedia.org, guidelines 
at:https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines  

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-19 Thread Ad Huikeshoven
Here is what ChatGPT has to offer on the subject of Earth's flatness:

Please answer in English the question "Elaborate on the question wether the
earth is flat, and give some sources."

There is no scientific evidence that the earth is flat. On the contrary,
there is a wealth of evidence that demonstrates that the earth is round.
This evidence has been obtained through a variety of scientific methods,
including observation of the sky, the use of geodetic systems, and the
findings of astronauts who have traveled to space.

Here are a few sources that you can consult if you want to learn more about
the fact that the earth is round:

   -

   NASA: "Is the Earth Round?" (
   
https://www.nasa.gov/audience/forstudents/k-4/stories/nasa-knows/what-is-earth-html
   )
   -

   Live Science: "Why is the Earth Round?" (
   https://www.livescience.com/32489-why-is-the-earth-round-.html)
   -

   Encyclopedia Britannica: "Earth" (https://www.britannica.com/topic/Earth)

It is important to remember that the earth is round and that this has been
proven through scientific research. The idea that the earth is flat is a
myth that has long been debunked.

On Mon, Dec 19, 2022 at 2:27 PM Gnangarra  wrote:

> AI simply cant descriminate between good research and faked research, for
> any outcome it must provide all of its sources whether they are from
> Wikipedia, Wikidata, WikiCommons, WikiSource or some other place.
> Otherwise it will answer yes to some asking if the world is flat because
> it'll seek out that answer and find all the nonsense that has been produced.
>
> On Mon, 19 Dec 2022 at 06:02, Erik Moeller  wrote:
>
>> On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten
>>  wrote:
>> > ChatGPT is now making headlines more or less every day  and I perceive
>> > them to try to position themself  av the "next" google.
>>
>> I suspect OpenAI will continue to focus on generative applications
>> (images, code, text for purposes such as copywriting, eventually
>> music/video) and won't attempt to compete with Google directly, but
>> we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone
>> to generating nonsensical answers, citations to works that don't
>> exist, etc. But it is pretty cool if you keep its limitations in
>> mind--for example, it's quite good at bootstrapping small scripts in
>> various programming languages (with mistakes and idiosyncrasies).
>>
>> Google has one of the largest AI research programs on the planet, they
>> just are extremely conservative about letting anyone try their models
>> (due to reputational concerns, e.g., that generative AI will spit out
>> racist output within about 30 seconds of people poking its
>> guardrails). This blog post from September is instructive about the
>> direction they're taking with what's called retrieval-augmented
>> generation; see the paper linked from the post for details:
>>
>> https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind
>> is part of Google)
>>
>> That is likely to yield significantly more accurate answers than what
>> ChatGPT is doing, and is difficult to replicate for folks like OpenAI
>> without being dependent on the search APIs of big search companies.
>> It's worth noting that Google has also started to incorporate language
>> model tooling into how it's presenting search results (e.g.,
>> summarizing or highlighting different parts of a website to make the
>> result snippet more useful).
>>
>> A retrieval-augmented approach that leverages Wikidata could IMO be
>> quite powerful and could be a useful research program for Wikimedia to
>> pursue, be it independently or in partnership with others. The
>> resulting technology should of course be fully open source.
>>
>> Querying Wikidata via SPARQL is currently still a bit of wizardry (and
>> the query builder is extremely limited). To pick a completely random
>> example not at all inspired by current events, if I wanted to see a
>> list of journalists with Mastodon accounts & a picture, I currently
>> have to do this:
>>
>> SELECT DISTINCT ?personLabel ?mastodonName ?pic
>> WHERE {
>>   ?person wdt:P4033 ?mastodonName ;
>> wdt:P106 ?occupation .
>>   OPTIONAL { ?person wdt:P18 ?pic . }
>>   ?occupation wdt:P279* wd:Q1930187 .
>>SERVICE wikibase:label {
>>  bd:serviceParam wikibase:language "en"
>>}
>> }
>>
>> Make a small mistake (a curly brace missing) and you'll get a red
>> error message. Forgot the * after wdt:P279? A different response set
>> in ways that are difficult to spot or reason about.
>>
>> Why can't I type "list of journalists with their picture and Mastodon
>> account" as a natural language query? (You can try it in ChatGPT and
>> it'll get you started, but it'll generate nonsense P/Q numbers.) If
>> such queries could be produced reliably, it could be a very useful
>> tool for readers as well.
>>
>> Warmly,
>> Erik
>> ___
>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, 

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-19 Thread Gnangarra
AI simply cant descriminate between good research and faked research, for
any outcome it must provide all of its sources whether they are from
Wikipedia, Wikidata, WikiCommons, WikiSource or some other place.
Otherwise it will answer yes to some asking if the world is flat because
it'll seek out that answer and find all the nonsense that has been produced.

On Mon, 19 Dec 2022 at 06:02, Erik Moeller  wrote:

> On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten
>  wrote:
> > ChatGPT is now making headlines more or less every day  and I perceive
> > them to try to position themself  av the "next" google.
>
> I suspect OpenAI will continue to focus on generative applications
> (images, code, text for purposes such as copywriting, eventually
> music/video) and won't attempt to compete with Google directly, but
> we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone
> to generating nonsensical answers, citations to works that don't
> exist, etc. But it is pretty cool if you keep its limitations in
> mind--for example, it's quite good at bootstrapping small scripts in
> various programming languages (with mistakes and idiosyncrasies).
>
> Google has one of the largest AI research programs on the planet, they
> just are extremely conservative about letting anyone try their models
> (due to reputational concerns, e.g., that generative AI will spit out
> racist output within about 30 seconds of people poking its
> guardrails). This blog post from September is instructive about the
> direction they're taking with what's called retrieval-augmented
> generation; see the paper linked from the post for details:
>
> https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind
> is part of Google)
>
> That is likely to yield significantly more accurate answers than what
> ChatGPT is doing, and is difficult to replicate for folks like OpenAI
> without being dependent on the search APIs of big search companies.
> It's worth noting that Google has also started to incorporate language
> model tooling into how it's presenting search results (e.g.,
> summarizing or highlighting different parts of a website to make the
> result snippet more useful).
>
> A retrieval-augmented approach that leverages Wikidata could IMO be
> quite powerful and could be a useful research program for Wikimedia to
> pursue, be it independently or in partnership with others. The
> resulting technology should of course be fully open source.
>
> Querying Wikidata via SPARQL is currently still a bit of wizardry (and
> the query builder is extremely limited). To pick a completely random
> example not at all inspired by current events, if I wanted to see a
> list of journalists with Mastodon accounts & a picture, I currently
> have to do this:
>
> SELECT DISTINCT ?personLabel ?mastodonName ?pic
> WHERE {
>   ?person wdt:P4033 ?mastodonName ;
> wdt:P106 ?occupation .
>   OPTIONAL { ?person wdt:P18 ?pic . }
>   ?occupation wdt:P279* wd:Q1930187 .
>SERVICE wikibase:label {
>  bd:serviceParam wikibase:language "en"
>}
> }
>
> Make a small mistake (a curly brace missing) and you'll get a red
> error message. Forgot the * after wdt:P279? A different response set
> in ways that are difficult to spot or reason about.
>
> Why can't I type "list of journalists with their picture and Mastodon
> account" as a natural language query? (You can try it in ChatGPT and
> it'll get you started, but it'll generate nonsense P/Q numbers.) If
> such queries could be produced reliably, it could be a very useful
> tool for readers as well.
>
> Warmly,
> Erik
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/YYTLJVCDSYITUKNA2DJSK5SSR3AZ3B5F/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>


-- 
Boodarwun
Gnangarra
'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar koortaboodjar'
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/CVXPECMNLGBGIQYP2DI7IRJVLUNNOF6B/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-18 Thread Erik Moeller
On Sun, Dec 11, 2022 at 5:55 AM Anders Wennersten
 wrote:
> ChatGPT is now making headlines more or less every day  and I perceive
> them to try to position themself  av the "next" google.

I suspect OpenAI will continue to focus on generative applications
(images, code, text for purposes such as copywriting, eventually
music/video) and won't attempt to compete with Google directly, but
we'll see. Currently GPT-3.5 (which ChatGPT is based on) is very prone
to generating nonsensical answers, citations to works that don't
exist, etc. But it is pretty cool if you keep its limitations in
mind--for example, it's quite good at bootstrapping small scripts in
various programming languages (with mistakes and idiosyncrasies).

Google has one of the largest AI research programs on the planet, they
just are extremely conservative about letting anyone try their models
(due to reputational concerns, e.g., that generative AI will spit out
racist output within about 30 seconds of people poking its
guardrails). This blog post from September is instructive about the
direction they're taking with what's called retrieval-augmented
generation; see the paper linked from the post for details:

https://www.deepmind.com/blog/building-safer-dialogue-agents (DeepMind
is part of Google)

That is likely to yield significantly more accurate answers than what
ChatGPT is doing, and is difficult to replicate for folks like OpenAI
without being dependent on the search APIs of big search companies.
It's worth noting that Google has also started to incorporate language
model tooling into how it's presenting search results (e.g.,
summarizing or highlighting different parts of a website to make the
result snippet more useful).

A retrieval-augmented approach that leverages Wikidata could IMO be
quite powerful and could be a useful research program for Wikimedia to
pursue, be it independently or in partnership with others. The
resulting technology should of course be fully open source.

Querying Wikidata via SPARQL is currently still a bit of wizardry (and
the query builder is extremely limited). To pick a completely random
example not at all inspired by current events, if I wanted to see a
list of journalists with Mastodon accounts & a picture, I currently
have to do this:

SELECT DISTINCT ?personLabel ?mastodonName ?pic
WHERE {
  ?person wdt:P4033 ?mastodonName ;
wdt:P106 ?occupation .
  OPTIONAL { ?person wdt:P18 ?pic . }
  ?occupation wdt:P279* wd:Q1930187 .
   SERVICE wikibase:label {
 bd:serviceParam wikibase:language "en"
   }
}

Make a small mistake (a curly brace missing) and you'll get a red
error message. Forgot the * after wdt:P279? A different response set
in ways that are difficult to spot or reason about.

Why can't I type "list of journalists with their picture and Mastodon
account" as a natural language query? (You can try it in ChatGPT and
it'll get you started, but it'll generate nonsense P/Q numbers.) If
such queries could be produced reliably, it could be a very useful
tool for readers as well.

Warmly,
Erik
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/YYTLJVCDSYITUKNA2DJSK5SSR3AZ3B5F/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org


[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-11 Thread Anders Wennersten
I perceive we have a good dialogue with Google, and where Google show 
respect and also acknowledge Wikipedia role in their fact box etc


ChatGPT is now making headlines more or less every day  and I perceive 
them to try to position themself  av the "next" google. But what about 
our dialogue with them, and respect and acknowledgement, if they are in 
their back end making use of the content of Wikipedia?


And if they are only having a very loose use of Wikipediacontent and are 
building content on their own, have we made an investigation in how it 
will effect us in the short and long term?


Anders

https://en.wikipedia.org/wiki/ChatGPT


Den 2022-12-11 kl. 13:18, skrev Kim Bruning via Wikimedia-l:

Hello Cuncutator,

I think there's several lines of thought on this.

On what theory would you argue that ChatGPT is violating Wiki[p|m]edia
copyright?

(If you've already posted reasoning elsewhere, or if someone else has
posted an opinion you happen to agree with; I'd be happy to read a link
as well)

sincerely,
 Kim

On Sat, Dec 10, 2022 at 11:23:48AM -0500, The Cunctator wrote:

It's trained on Wikipedia. Here's a 2020 paper from the authors. I would
argue it's violating the copyright but I'm aware the foundation isn't very
interested in defending it.

  https://arxiv.org/abs/2005.14165

On Sat, Dec 10, 2022, 10:42 AM Anders Wennersten 
wrote:


Is this Ai software using info from Wikipedia directly or indirecly, and
if not is it an alternative way of storing knowledge to the wikiway?

Anders

___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/ALV3QLKTB7RWCCR5W4HNDA4ZDG5ARRDC/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org


___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/DQ5ZMSTQNSOYIPRCQBYEN2NFPDFIXG35/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BWOGXLLUU6C5Q46YIDCZKW6VG655PZIQ/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/A7XUQ5CCAOCHSEKK2VFFK3FIES4WJTM7/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-11 Thread Kim Bruning via Wikimedia-l
Hello Cuncutator,

I think there's several lines of thought on this.

On what theory would you argue that ChatGPT is violating Wiki[p|m]edia
copyright?

(If you've already posted reasoning elsewhere, or if someone else has
posted an opinion you happen to agree with; I'd be happy to read a link
as well)

sincerely,
Kim

On Sat, Dec 10, 2022 at 11:23:48AM -0500, The Cunctator wrote:
> It's trained on Wikipedia. Here's a 2020 paper from the authors. I would
> argue it's violating the copyright but I'm aware the foundation isn't very
> interested in defending it.
> 
>  https://arxiv.org/abs/2005.14165
> 
> On Sat, Dec 10, 2022, 10:42 AM Anders Wennersten 
> wrote:
> 
> > Is this Ai software using info from Wikipedia directly or indirecly, and
> > if not is it an alternative way of storing knowledge to the wikiway?
> >
> > Anders
> >
> > ___
> > Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> > at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> > https://meta.wikimedia.org/wiki/Wikimedia-l
> > Public archives at
> > https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/ALV3QLKTB7RWCCR5W4HNDA4ZDG5ARRDC/
> > To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
> >

> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at 
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/DQ5ZMSTQNSOYIPRCQBYEN2NFPDFIXG35/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BWOGXLLUU6C5Q46YIDCZKW6VG655PZIQ/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org


[Wikimedia-l] Re: ChatGPT and Wikipedia

2022-12-10 Thread The Cunctator
It's trained on Wikipedia. Here's a 2020 paper from the authors. I would
argue it's violating the copyright but I'm aware the foundation isn't very
interested in defending it.

 https://arxiv.org/abs/2005.14165

On Sat, Dec 10, 2022, 10:42 AM Anders Wennersten 
wrote:

> Is this Ai software using info from Wikipedia directly or indirecly, and
> if not is it an alternative way of storing knowledge to the wikiway?
>
> Anders
>
> ___
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/ALV3QLKTB7RWCCR5W4HNDA4ZDG5ARRDC/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/DQ5ZMSTQNSOYIPRCQBYEN2NFPDFIXG35/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org