[Wikimedia-l] Re: Chat GPT

Raymond Leonard Sat, 31 Dec 2022 11:07:46 -0800

Of relevance to this conversation:

https://www.wired.com/story/large-language-models-artificial-intelligence/


On Fri, Dec 30, 2022 at 9:32 AM Neurodivergent Netizen <
[email protected]> wrote:

> One concern I have is that all “oldbies” like myself have all seen bots
> basically decay after whomever is maintaining goes inactive. Of course,
> this could be mostly rectified by having the AI be open source. This leaves
> the “people” aspect; that is, not only does the AI need to be maintained,
> but interest needs to be maintained as well.
>
> From,
> I dream of horses
> She/her
>
>
>
>
>
> On Dec 30, 2022, at 8:53 AM, Victoria Coleman <[email protected]>
> wrote:
>
> Anne,
>
> Interestingly enough what these large companies have to spend a ton of
> money on is creating and moderating content. In other words people.
> Passionate volunteers in large numbers is what the movement has in
> abundance. Imagine the power of combining the talents and passion of our
> community members with the advances offered by AI today. I was struck
> recently during a visit to NVIDIA how language models have changed. Back in
> my day, we would have to build one language model per domain and then load
> it in to the device, a computer or a phone, to  use. Now they have one
> massive combined language model in a data center full of their GPUs which
> is there so long as you are connected. My sense is that within the guard
> rails offered by our volunteer community, we could use AI to force multiply
> their efforts and make knowledge even more accessible than it is today.
> Both for those who create and record knowledge as well as those who consume
> it. In the case of Chat GPT, our volunteers could use supervised learning
> for example to narrow down the mistakes the bot makes - which should be
> many fewer that the Open AI version since the Wikipedia version would be
> trained on good, clean Wikipedia content which is constantly reviewed by
> the community.
>
> Best regards,
>
> Victoria Coleman
>
> On Dec 30, 2022, at 12:21 AM, Risker <[email protected]> wrote:
>
> 
> Given what we already know about AI-like projects (think Siri, Alexis,
> etc), they're the result of work done by organizations utilizing resources
> hundreds of times greater than the resources within the entire Wikimedia
> movement, and they'renot all that good if we're being honest.  They're
> entirely dependent on existing resources.  We have seen time and again how
> easily they can be led astray; ChatGPT is just the most recent example.  It
> is full of misinformation.  Other efforts have resulted in the AI becoming
> radicalized.  Again, it's all about what sources the AI project uses in
> developing its responses, and those underlying sources are generally
> completely unknown to the person asking for the information.
>
> Ironically, our volunteers have created software that learns pretty
> effectively (ORES, several anti-vandalism "bots").  The tough part is
> ensuring that there is continued, long-term support for these volunteer-led
> efforts, and the ability to make them effective on projects using other
> languages. We've had bots making translations of formulaic articles from
> one language to another for years; again, they depend on volunteers who can
> maintain and support those bots, and ensure continued quality of
> translation.
>
> AI development is tough. It is monumentally expensive. Big players have
> invested billions USD trying to develop working AI, with some of the most
> talented programmers and developers in the world, and they're barely
> scratching the surface.  I don't see this as a priority for the Wikimedia
> movement, which achieves considerably higher quality with volunteers
> following a fairly simple rule set that the volunteers themselves develop
> based on tried and tested knowledge.  Let's let those with lots of money
> keep working to develop something that is useful, and then we can start
> seeing if it can become feasible for our use.
>
>  I envision the AI industry being similar to the computer hardware
> industry. My first computer cost about the same (in 2022 dollars) as the
> four computers and all their peripherals that I have within my reach as I
> write this, and had less than 1% of the computing power of each of
> them.[1]  The cost will go down once the technology gets better and more
> stable.
>
> Risker/Anne
>
> [1] Comparison of 1990 to 2022 dollars.
>
>
>
> On Fri, 30 Dec 2022 at 01:40, Yaroslav Blanter <[email protected]> wrote:
>
>> Hi,
>>
>> just to remark that it superficially looks like a great tool for small
>> language Wikipedias (for which the translation tool is typically not
>> available). One can train the tool in some less common language using the
>> dictionary and some texts, and then let it fill the project with a
>> thousands of articles. (As an aside, in fact, one probably can train it to
>> the soon-to-be-extint languages and save them until the moment there is any
>> interest for revival, but nobody seems to be interested). However, there is
>> a high potential for abuse, as I can imagine people not speaking the
>> language running the tool and creating thousands of substandard articles -
>> we have seen this done manually, and I would be very cautious allowing this.
>>
>> Best
>> Yaroslav
>>
>> On Fri, Dec 30, 2022 at 4:57 AM Raymond Leonard <
>> [email protected]> wrote:
>>
>>> As a friend wrote on a Slack thread about the topic, "ChatGPT can
>>> produce results that appear stunningly intelligent, and there are things
>>> that I’ve seen that really leave me scratching my head- “how on Earth
>>> did it DO that?!?”  But it’s important to remember that it isn’t actually
>>> intelligent.  It’s not “thinking.”  It’s more of a glorified version of
>>> autosuggest.  When it apologizes, it’s not really apologizing, it’s just
>>> finding text that fits the self description it was fed and that looks
>>> related to what you fed it."
>>>
>>> The person initiating the thread had asked ChatGPT "What are the 5
>>> biggest intentional communities on each continent?" (As an aside, this
>>> was as challenging as the question that led to Wikidata, "What are the ten
>>> largest cities in the world that have women mayors?") One of the answers
>>> ChatGPT gave for Europe was "Ikaria (Greece)". As near as I can determine,
>>> there is no intentional community of any size in Ikaria. However, the
>>> Icarians <https://en.wikipedia.org/wiki/Icarians> were a 19th-century
>>> intentional community in the US founded by French expatriates. It was named
>>> after a utopian novel, *Voyage en Icarie*, that was written by Étienne
>>> Cabet. He chose the Greek island of Icaria as the setting of his utopian
>>> vision. Interesting that ChatGPT may have conflated these.
>>>
>>> It seems that given a prompt, ChatGPT shuffles & regurgitates facts.
>>> Just as a card dealer deals a good hand, sometimes ChatGPT seems to make
>>> sense, but I think at present it really is " a glorified version of
>>> autosuggest."
>>>
>>> Yours
>>> Peaceray
>>>
>>>
>>>
>>> On Thu, Dec 29, 2022 at 6:39 PM Gnangarra <[email protected]> wrote:
>>>
>>>> I think the simplest answer is yes its an artificial writer but its not
>>>> intelligence as the name implies but rather just a piece of software that
>>>> gives answers according to the methodology of that software. The garbage in
>>>> garbage out format, it can never be better than the programmers behind the
>>>> machine
>>>>
>>>> On Fri, 30 Dec 2022 at 09:56, Victoria Coleman <
>>>> [email protected]> wrote:
>>>>
>>>>> Thank you Ziko and Steven for the thoughtful responses.
>>>>>
>>>>> My sense is that for a class for readers having a generative UI that
>>>>> returns an answer VS an article would be useful. It would probably put
>>>>> Quora out of business. :-)
>>>>>
>>>>> If the models are not open source, this indeed would require
>>>>> developing our own models. For that kind of investment, we would probably
>>>>> want to have more application areas. Translation being one that Ziko
>>>>> already pointed out but also summarization. These kinds of Information
>>>>> retrieval queries would effectively index into specific parts of an 
>>>>> article
>>>>> vs returning the whole thing.
>>>>>
>>>>> Wikipedia as we all know is not perfect but it’s about the best you
>>>>> can get with the thousands of editors and reviewers doing quality control.
>>>>> If a bot was exclusively trained on Wikipedia, my guess is that the
>>>>> falsehood generation would be as minimal as it can get. Garbage in garbage
>>>>> out in all these models. Good stuff in good stuff out. I guess the
>>>>> falsehoods can also come when no material exists in the model. So instead
>>>>> of making stuff up, they could default to “I don’t know the answer to
>>>>> that”. Or in our case, we could add the topic to the list of article
>>>>> suggestions to editors…
>>>>>
>>>>> I know I am almost day dreaming here but I can’t help but think that
>>>>> all the recent advances in AI could create significantly broader free
>>>>> knowledge pathways for every human being. And I don’t see us getting after
>>>>> them aggressively enough…
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Victoria Coleman
>>>>>
>>>>> On Dec 29, 2022, at 5:17 PM, Steven Walling <[email protected]>
>>>>> wrote:
>>>>>
>>>>> 
>>>>>
>>>>>
>>>>> On Thu, Dec 29, 2022 at 4:09 PM Victoria Coleman <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi everyone. I have seen some of the reactions to the narratives
>>>>>> generated by Chat GPT. There is an obvious question (to me at least) as 
>>>>>> to
>>>>>> whether a Wikipedia chat bot would be a legitimate UI for some users. To
>>>>>> that end, I would have hoped that it would have been developed by the WMF
>>>>>> but the Foundation has historically massively underinvested in AI. That
>>>>>> said, and assuming that GPT Open source licensing is compatible with the
>>>>>> movement norms, should the WMF include that UI in the product?
>>>>>
>>>>>
>>>>> This is a cool idea but what would the goals of developing a
>>>>> Wikipedia-specific generative AI be? IMO it would be nice to have a 
>>>>> natural
>>>>> language search right in Wikipedia that could return factual answers not
>>>>> just links to our (often too long) articles.
>>>>>
>>>>> OpenAI models aren’t open source btw. Some of the products are free to
>>>>> use right now, but their business model is to charge for API use etc. so
>>>>> including it directly in Wikipedia is pretty much a non-starter.
>>>>>
>>>>> My other question is around the corpus that Open AI is using to train
>>>>>> the bot. It is creating very fluid narratives that are massively false in
>>>>>> many cases. Are they training on Wikipedia? Something else?
>>>>>
>>>>>
>>>>> They’re almost certainly using Wikipedia. The answer from ChatGPT is:
>>>>>
>>>>> “ChatGPT is a chatbot model developed by OpenAI. It was trained on a
>>>>> dataset of human-generated text, including data from a variety of sources
>>>>> such as books, articles, and websites. It is possible that some of the 
>>>>> data
>>>>> used to train ChatGPT may have come from Wikipedia, as Wikipedia is a
>>>>> widely-used source of information and is likely to be included in many
>>>>> datasets of human-generated text.”
>>>>>
>>>>> And to my earlier question, if GPT were to be trained on Wikipedia
>>>>>> exclusively would that help abate the false narratives
>>>>>
>>>>>
>>>>> Who knows but we would have to develop our own models to test this
>>>>> idea.
>>>>>
>>>>>>
>>>>> This is a significant matter for the  community and seeing us step to
>>>>>> it would be very encouraging.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Victoria Coleman
>>>>>> _______________________________________________
>>>>>> Wikimedia-l mailing list -- [email protected],
>>>>>> guidelines at:
>>>>>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>>>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>>>>> Public archives at
>>>>>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/CYPO3PEMM4FIWPNL6MRTORHZXVTS2VNN/
>>>>>> To unsubscribe send an email to [email protected]
>>>>>>
>>>>> _______________________________________________
>>>>> Wikimedia-l mailing list -- [email protected],
>>>>> guidelines at:
>>>>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>>>> Public archives at
>>>>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/G57JUOQ5S5ZHXHWJN7LPYEBZMFVMJGVO/
>>>>> To unsubscribe send an email to [email protected]
>>>>>
>>>>> _______________________________________________
>>>>> Wikimedia-l mailing list -- [email protected],
>>>>> guidelines at:
>>>>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>>>> Public archives at
>>>>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/WH6SHKVKPBVKPPWID5WFM2RSY3ZUUSQ6/
>>>>> To unsubscribe send an email to [email protected]
>>>>
>>>>
>>>>
>>>> --
>>>> Boodarwun
>>>> Gnangarra
>>>> 'ngany dabakarn koorliny arn boodjera dardoon ngalang Nyungar
>>>> koortaboodjar'
>>>>
>>>> _______________________________________________
>>>> Wikimedia-l mailing list -- [email protected],
>>>> guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
>>>> and https://meta.wikimedia.org/wiki/Wikimedia-l
>>>> Public archives at
>>>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/N4CYGIOUJOAO2FCKKRFSMFZTATIYUKL5/
>>>> To unsubscribe send an email to [email protected]
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- [email protected], guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/FIALTVJ6AR6MRDUBECFPIDXX5YXNC2CS/
>>> To unsubscribe send an email to [email protected]
>>
>> _______________________________________________
>> Wikimedia-l mailing list -- [email protected], guidelines
>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/GIEYQ7BNV4LMR4YOIYSUUL4OLAQVGAFO/
>> To unsubscribe send an email to [email protected]
>
> _______________________________________________
> Wikimedia-l mailing list -- [email protected], guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/W4IAWBV7VPBRFNQGRZT54UIV77E7M2XJ/
> To unsubscribe send an email to [email protected]
>
> _______________________________________________
> Wikimedia-l mailing list -- [email protected], guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/5F3ONUSUOKXV52ZCZ73T5KVPAWMJUTYN/
> To unsubscribe send an email to [email protected]
>
>
> _______________________________________________
> Wikimedia-l mailing list -- [email protected], guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/2UBNTXB72SIMB7NRXSLQNBYJNVFQAO4E/
> To unsubscribe send an email to [email protected]

_______________________________________________
Wikimedia-l mailing list -- [email protected], guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/2L5EPGDMHEEOGWMMXI6VF7UUQ7CNBC6V/
To unsubscribe send an email to [email protected]

[Wikimedia-l] Re: Chat GPT

Reply via email to