[Wikimedia-l] Re: Chat GPT

Adam Sobieski Fri, 03 Feb 2023 20:24:06 -0800

Wikimedia Mailing List,


Hello. I just discovered this mailing list thread and am also interested in the 
topics of crowdsourcing and dialogue systems. I support the vision of 
man-machine collaboration and synergy indicated by Victoria Coleman.

With respect to the state of the art, modern dialogue systems include: ChatGPT 
by OpenAI, Sparrow by DeepMind, and TeachMe by AI2. These modern dialogue 
systems can interact with end-users conversationally about knowledge; some can 
cite their sources; and some can learn, on-the-fly, from operators in control 
centers, subject-matter experts, and/or broader crowdsourced communities.

Major search engine providers are, according to news reports, already, or soon 
will be, integrating modern dialogue systems. Will the Wikimedia Search 
Platform be exploring conversational search features?

User experiences for control center operators or for broader communities of 
editors to interact with that knowledge, that content, utilized by large-scale 
dialogue systems could be Wiki-based.

In theory, community dashboards, potentially personalized for each editor, 
could be provided for editors to determine which articles were popular or 
trending in terms of usage by dialogue systems' end-users, or otherwise 
determined to be in potential need of human review, moderation, or curation. 
These and other related approaches to community productivity enhancement could 
be of use for amplifying the performance of and synergy between communities of 
editors and AI systems.

In a recent bibliography [1], I reference some contemporary scholarly and 
scientific publications hoping to point to and to indicate that research is 
underway into how modern dialogue systems could interoperate with, interact 
with, both read from and write to, Wiki systems.





Best regards,

Adam

[1] http://www.phoster.com/dialogue-systems-and-information-retrieval/

________________________________
From: Raymond Leonard <raymond.f.leonard...@gmail.com>
Sent: Saturday, December 31, 2022 2:06 PM
To: Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
Subject: [Wikimedia-l] Re: Chat GPT

Of relevance to this conversation:

https://www.wired.com/story/large-language-models-artificial-intelligence/

On Fri, Dec 30, 2022 at 9:32 AM Neurodivergent Netizen 
<idoh.idreamofhor...@gmail.com<mailto:idoh.idreamofhor...@gmail.com>> wrote:
One concern I have is that all “oldbies” like myself have all seen bots 
basically decay after whomever is maintaining goes inactive. Of course, this 
could be mostly rectified by having the AI be open source. This leaves the 
“people” aspect; that is, not only does the AI need to be maintained, but 
interest needs to be maintained as well.

From,
I dream of horses
She/her





On Dec 30, 2022, at 8:53 AM, Victoria Coleman 
<vstavridoucole...@gmail.com<mailto:vstavridoucole...@gmail.com>> wrote:

Anne,

Interestingly enough what these large companies have to spend a ton of money on 
is creating and moderating content. In other words people. Passionate 
volunteers in large numbers is what the movement has in abundance. Imagine the 
power of combining the talents and passion of our community members with the 
advances offered by AI today. I was struck recently during a visit to NVIDIA 
how language models have changed. Back in my day, we would have to build one 
language model per domain and then load it in to the device, a computer or a 
phone, to  use. Now they have one massive combined language model in a data 
center full of their GPUs which is there so long as you are connected. My sense 
is that within the guard rails offered by our volunteer community, we could use 
AI to force multiply their efforts and make knowledge even more accessible than 
it is today.  Both for those who create and record knowledge as well as those 
who consume it. In the case of Chat GPT, our volunteers could use supervised 
learning for example to narrow down the mistakes the bot makes - which should 
be many fewer that the Open AI version since the Wikipedia version would be 
trained on good, clean Wikipedia content which is constantly reviewed by the 
community.

Best regards,

Victoria Coleman

On Dec 30, 2022, at 12:21 AM, Risker 
<risker...@gmail.com<mailto:risker...@gmail.com>> wrote:


Given what we already know about AI-like projects (think Siri, Alexis, etc), 
they're the result of work done by organizations utilizing resources hundreds 
of times greater than the resources within the entire Wikimedia movement, and 
they'renot all that good if we're being honest.  They're entirely dependent on 
existing resources.  We have seen time and again how easily they can be led 
astray; ChatGPT is just the most recent example.  It is full of misinformation. 
 Other efforts have resulted in the AI becoming radicalized.  Again, it's all 
about what sources the AI project uses in developing its responses, and those 
underlying sources are generally completely unknown to the person asking for 
the information.

Ironically, our volunteers have created software that learns pretty effectively 
(ORES, several anti-vandalism "bots").  The tough part is ensuring that there 
is continued, long-term support for these volunteer-led efforts, and the 
ability to make them effective on projects using other languages. We've had 
bots making translations of formulaic articles from one language to another for 
years; again, they depend on volunteers who can maintain and support those 
bots, and ensure continued quality of translation.

AI development is tough. It is monumentally expensive. Big players have 
invested billions USD trying to develop working AI, with some of the most 
talented programmers and developers in the world, and they're barely scratching 
the surface.  I don't see this as a priority for the Wikimedia movement, which 
achieves considerably higher quality with volunteers following a fairly simple 
rule set that the volunteers themselves develop based on tried and tested 
knowledge.  Let's let those with lots of money keep working to develop 
something that is useful, and then we can start seeing if it can become 
feasible for our use.

 I envision the AI industry being similar to the computer hardware industry. My 
first computer cost about the same (in 2022 dollars) as the four computers and 
all their peripherals that I have within my reach as I write this, and had less 
than 1% of the computing power of each of them.[1]  The cost will go down once 
the technology gets better and more stable.

Risker/Anne

[1] Comparison of 1990 to 2022 dollars.



On Fri, 30 Dec 2022 at 01:40, Yaroslav Blanter 
<ymb...@gmail.com<mailto:ymb...@gmail.com>> wrote:
Hi,

just to remark that it superficially looks like a great tool for small language 
Wikipedias (for which the translation tool is typically not available). One can 
train the tool in some less common language using the dictionary and some 
texts, and then let it fill the project with a thousands of articles. (As an 
aside, in fact, one probably can train it to the soon-to-be-extint languages 
and save them until the moment there is any interest for revival, but nobody 
seems to be interested). However, there is a high potential for abuse, as I can 
imagine people not speaking the language running the tool and creating 
thousands of substandard articles - we have seen this done manually, and I 
would be very cautious allowing this.

Best
Yaroslav

On Fri, Dec 30, 2022 at 4:57 AM Raymond Leonard 
<raymond.f.leonard...@gmail.com<mailto:raymond.f.leonard...@gmail.com>> wrote:
As a friend wrote on a Slack thread about the topic, "ChatGPT can produce 
results that appear stunningly intelligent, and there are things that I’ve seen 
that really leave me scratching my head- “how on Earth did it DO that?!?”  But 
it’s important to remember that it isn’t actually intelligent.  It’s not 
“thinking.”  It’s more of a glorified version of autosuggest.  When it 
apologizes, it’s not really apologizing, it’s just finding text that fits the 
self description it was fed and that looks related to what you fed it."

The person initiating the thread had asked ChatGPT "What are the 5 biggest 
intentional communities on each continent?" (As an aside, this was as 
challenging as the question that led to Wikidata, "What are the ten largest 
cities in the world that have women mayors?") One of the answers ChatGPT gave 
for Europe was "Ikaria (Greece)". As near as I can determine, there is no 
intentional community of any size in Ikaria. However, the 
Icarians<https://en.wikipedia.org/wiki/Icarians> were a 19th-century 
intentional community in the US founded by French expatriates. It was named 
after a utopian novel, Voyage en Icarie, that was written by Étienne Cabet. He 
chose the Greek island of Icaria as the setting of his utopian vision. 
Interesting that ChatGPT may have conflated these.

It seems that given a prompt, ChatGPT shuffles & regurgitates facts. Just as a 
card dealer deals a good hand, sometimes ChatGPT seems to make sense, but I 
think at present it really is " a glorified version of autosuggest."

Yours
Peaceray



On Thu, Dec 29, 2022 at 6:39 PM Gnangarra 
<gnanga...@gmail.com<mailto:gnanga...@gmail.com>> wrote:
I think the simplest answer is yes its an artificial writer but its not 
intelligence as the name implies but rather just a piece of software that gives 
answers according to the methodology of that software. The garbage in garbage 
out format, it can never be better than the programmers behind the machine

On Fri, 30 Dec 2022 at 09:56, Victoria Coleman 
<vstavridoucole...@gmail.com<mailto:vstavridoucole...@gmail.com>> wrote:
Thank you Ziko and Steven for the thoughtful responses.

My sense is that for a class for readers having a generative UI that returns an 
answer VS an article would be useful. It would probably put Quora out of 
business. :-)

If the models are not open source, this indeed would require developing our own 
models. For that kind of investment, we would probably want to have more 
application areas. Translation being one that Ziko already pointed out but also 
summarization. These kinds of Information retrieval queries would effectively 
index into specific parts of an article vs returning the whole thing.

Wikipedia as we all know is not perfect but it’s about the best you can get 
with the thousands of editors and reviewers doing quality control. If a bot was 
exclusively trained on Wikipedia, my guess is that the falsehood generation 
would be as minimal as it can get. Garbage in garbage out in all these models. 
Good stuff in good stuff out. I guess the falsehoods can also come when no 
material exists in the model. So instead of making stuff up, they could default 
to “I don’t know the answer to that”. Or in our case, we could add the topic to 
the list of article suggestions to editors…

I know I am almost day dreaming here but I can’t help but think that all the 
recent advances in AI could create significantly broader free knowledge 
pathways for every human being. And I don’t see us getting after them 
aggressively enough…

Best regards,

Victoria Coleman

On Dec 29, 2022, at 5:17 PM, Steven Walling 
<steven.wall...@gmail.com<mailto:steven.wall...@gmail.com>> wrote:




On Thu, Dec 29, 2022 at 4:09 PM Victoria Coleman 
<vstavridoucole...@gmail.com<mailto:vstavridoucole...@gmail.com>> wrote:
Hi everyone. I have seen some of the reactions to the narratives generated by 
Chat GPT. There is an obvious question (to me at least) as to whether a 
Wikipedia chat bot would be a legitimate UI for some users. To that end, I 
would have hoped that it would have been developed by the WMF but the 
Foundation has historically massively underinvested in AI. That said, and 
assuming that GPT Open source licensing is compatible with the movement norms, 
should the WMF include that UI in the product?

This is a cool idea but what would the goals of developing a Wikipedia-specific 
generative AI be? IMO it would be nice to have a natural language search right 
in Wikipedia that could return factual answers not just links to our (often too 
long) articles.

OpenAI models aren’t open source btw. Some of the products are free to use 
right now, but their business model is to charge for API use etc. so including 
it directly in Wikipedia is pretty much a non-starter.

My other question is around the corpus that Open AI is using to train the bot. 
It is creating very fluid narratives that are massively false in many cases. 
Are they training on Wikipedia? Something else?

They’re almost certainly using Wikipedia. The answer from ChatGPT is:

“ChatGPT is a chatbot model developed by OpenAI. It was trained on a dataset of 
human-generated text, including data from a variety of sources such as books, 
articles, and websites. It is possible that some of the data used to train 
ChatGPT may have come from Wikipedia, as Wikipedia is a widely-used source of 
information and is likely to be included in many datasets of human-generated 
text.”

And to my earlier question, if GPT were to be trained on Wikipedia exclusively 
would that help abate the false narratives

Who knows but we would have to develop our own models to test this idea.

This is a significant matter for the  community and seeing us step to it would 
be very encouraging.

Best regards,

Victoria Coleman
_______________________________________________
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org<mailto:wikimedia-l@lists.wikimedia.org>, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/CYPO3PEMM4FIWPNL6MRTORHZXVTS2VNN/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org<mailto:wikimedia-l-le...@lists.wikimedia.org>
_______________________________________________
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org<mailto:wikimedia-l@lists.wikimedia.org>, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/G57JUOQ5S5ZHXHWJN7LPYEBZMFVMJGVO/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org<mailto:wikimedia-l-le...@lists.wikimedia.org>
_______________________________________________
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org<mailto:wikimedia-l@lists.wikimedia.org>, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/WH6SHKVKPBVKPPWID5WFM2RSY3ZUUSQ6/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org<mailto:wikimedia-l-le...@lists.wikimedia.org>


--
Boodarwun
Gnangarra
'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar koortaboodjar'

_______________________________________________
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org<mailto:wikimedia-l@lists.wikimedia.org>, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/N4CYGIOUJOAO2FCKKRFSMFZTATIYUKL5/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org<mailto:wikimedia-l-le...@lists.wikimedia.org>
_______________________________________________
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org<mailto:wikimedia-l@lists.wikimedia.org>, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/FIALTVJ6AR6MRDUBECFPIDXX5YXNC2CS/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org<mailto:wikimedia-l-le...@lists.wikimedia.org>
_______________________________________________
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org<mailto:wikimedia-l@lists.wikimedia.org>, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/GIEYQ7BNV4LMR4YOIYSUUL4OLAQVGAFO/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org<mailto:wikimedia-l-le...@lists.wikimedia.org>
_______________________________________________
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org<mailto:wikimedia-l@lists.wikimedia.org>, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/W4IAWBV7VPBRFNQGRZT54UIV77E7M2XJ/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org<mailto:wikimedia-l-le...@lists.wikimedia.org>
_______________________________________________
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org<mailto:wikimedia-l@lists.wikimedia.org>, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/5F3ONUSUOKXV52ZCZ73T5KVPAWMJUTYN/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org<mailto:wikimedia-l-le...@lists.wikimedia.org>

_______________________________________________
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org<mailto:wikimedia-l@lists.wikimedia.org>, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/2UBNTXB72SIMB7NRXSLQNBYJNVFQAO4E/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org<mailto:wikimedia-l-le...@lists.wikimedia.org>

_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/PO6FKT32ALE6ID2YJOFW7J2DJLP7DXRE/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: Chat GPT

Reply via email to