[Wikimedia-l] Re: Bing-ChatGPT

2023-03-29 Thread Lauren Worden
On Wed, Mar 29, 2023 at 1:04 PM Felipe Schenone wrote: > > FYI, there's an open letter requesting a 6-month pause on AI development, [ > https://futureoflife.org/open-letter/pause-giant-ai-experiments/ ] with > reasonable arguments (in my opinion) and signed by several big names too. First, I

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-29 Thread Felipe Schenone
FYI, there's an open letter requesting a 6-month pause on AI development , with reasonable arguments (in my opinion) and signed by several big names too. The basic rationale, as I understand it, is that similar to human cloning,

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-25 Thread Amir Sarabadani
Repeating exactly what has been in the training data is not overfitting. Overfitting is when the model fails to recognize the underlying pattern in the training data leading to inaccurate or false results when used on new data. Getting the exact same prediction from the training data set is

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-23 Thread Kimmo Virtanen
Hi, I just noticed that OpenAI has fixed the Wikidata property and item mappings so now it can generate working SPARQL. Example: Prompt : Search finnish female journaists using SPARQL from Wikidata? *GPT-3.5 (default)* > To search for Finnish female journalists using SPARQL from Wikidata, you

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-23 Thread Erik Moeller
On Wed, Mar 22, 2023 at 11:53 AM Lauren Worden wrote: > BARD also produces lengthy passages from its training data verbatim > without elicitation: > https://old.reddit.com/r/Bard/comments/11xxaxj/bard_copied_user_text_from_a_forum_word_for_word/jd58764/ Very true. I tested the "Mr. Ripley"

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-22 Thread Lauren Worden
Google BARD, announced this week, also tries and fails to perform attribution and verification: https://old.reddit.com/r/Bard/comments/11yeegu/google_bard_claims_bard_has_already_been_shut/jd77wpo/ BARD also produces lengthy passages from its training data verbatim without elicitation:

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-21 Thread Lauren Worden
On Mon, Mar 20, 2023 at 9:28 PM Kim Bruning via Wikimedia-l wrote: > On Sun, Mar 19, 2023 at 02:48:12AM -0700, Lauren Worden wrote: > > > > LLMs absolutely do encode a verbatim copy of their > > training data, which can be produced intact with little effort. > > >

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-21 Thread Adam Sobieski
Subject: [Wikimedia-l] Re: Bing-ChatGPT On Sun, Mar 19, 2023 at 02:48:12AM -0700, Lauren Worden wrote: > > They have, and LLMs absolutely do encode a verbatim copy of their > training data, which can be produced intact with little effort. > https://arxiv.org/pdf/2205.10770

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-20 Thread Kim Bruning via Wikimedia-l
On Sun, Mar 19, 2023 at 02:48:12AM -0700, Lauren Worden wrote: > > They have, and LLMs absolutely do encode a verbatim copy of their > training data, which can be produced intact with little effort. > https://arxiv.org/pdf/2205.10770.pdf > https://bair.berkeley.edu/blog/2020/12/20/lmmem/ My

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-20 Thread Erik Moeller
On Sun, Mar 19, 2023 at 12:12 PM Lauren Worden wrote: > They have, and LLMs absolutely do encode a verbatim copy of their > training data, which can be produced intact with little effort. See > https://arxiv.org/pdf/2205.10770.pdf -- in particular the first > paragraph of the Background and

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-19 Thread Lauren Worden
On Sat, Mar 18, 2023 at 3:49 PM Erik Moeller wrote: > > ...With image-generating models like Stable Diffusion, it's been found > that the models sometimes generate output nearly indistinguishable > from source material [1]. I don't know if similar studies have been > undertaken for

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-19 Thread Kimmo Virtanen
> > Or, maybe just require an open disclosure of where the bot pulled from and > how much, instead of having it be a black box? "Text in this response > derived from: 17% Wikipedia article 'Example', 12% Wikipedia article > 'SomeOtherThing', 10%...". Current (ie. ChatGPT) systems doesn't work

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-19 Thread Kimmo Virtanen
; be of use for amplifying these features and capabilities for end-users. >>> >>> >>> Best regards, >>> Adam Sobieski >>> >>> [1] >>> https://learn.microsoft.com/en-us/deployedge/microsoft-edge-relnote-stable-channel?ranMID=24542#version-

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-19 Thread Galder Gonzalez Larrañaga
Dear all,Your discussion and points are really interested. I just wanted to point that, as far as I know, "Text in this response derived from: 17% Wikipedia article 'Example', 12% Wikipedia article 'SomeOtherThing', 10%..."." Idea is impossible, as generative AIs derive from all articles or texts

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-19 Thread Todd Allen
Or, maybe just require an open disclosure of where the bot pulled from and how much, instead of having it be a black box? "Text in this response derived from: 17% Wikipedia article 'Example', 12% Wikipedia article 'SomeOtherThing', 10%...". On Sat, Mar 18, 2023 at 10:17 PM Steven Walling wrote:

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-18 Thread Steven Walling
On Sat, Mar 18, 2023 at 3:49 PM Erik Moeller wrote: > On Fri, Mar 17, 2023 at 7:05 PM Steven Walling > wrote: > > > IANAL of course, but to me this implies that responsibility for the > *egregious* lack > > of attribution in models that rely substantially on Wikipedia is > violating the

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-18 Thread Erik Moeller
On Fri, Mar 17, 2023 at 7:05 PM Steven Walling wrote: > IANAL of course, but to me this implies that responsibility for the > *egregious* lack > of attribution in models that rely substantially on Wikipedia is violating > the Attribution > requirements of CC licenses. Morally, I agree that

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-18 Thread Matej Grochal
Web standards, e.g., Web schema, can >>> be of use for amplifying these features and capabilities for end-users. >>> >>> >>> Best regards, >>> Adam Sobieski >>> >>> [1] >>> https://learn.microsoft.com/en-us/deployedge/microsoft-edge

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-18 Thread Peter Southwood
“Cowen has sufficient credentials to be treated as a reliable expert” Maybe not for much longer. Cheers, P. From: The Cunctator [mailto:cuncta...@gmail.com] Sent: 17 March 2023 17:49 To: Wikimedia Mailing List Subject: [Wikimedia-l] Re: Bing-ChatGPT This is an important development

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-17 Thread Steven Walling
dge/microsoft-edge-relnote-stable-channel?ranMID=24542#version-1110166141-march-13-2023 >> [2] https://www.engadget.com/microsoft-edge-ai-copilot-184033427.html >> >> -- >> *From:* Kimmo Virtanen >> *Sent:* Friday, March 17, 2023 8:17 AM >> *To:* Wikimedia Mailing List &g

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-17 Thread The Cunctator
33427.html > > -- > *From:* Kimmo Virtanen > *Sent:* Friday, March 17, 2023 8:17 AM > *To:* Wikimedia Mailing List > *Subject:* [Wikimedia-l] Re: Bing-ChatGPT > > Hi, > > The development of open-source large language models is going f

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-17 Thread Adam Sobieski
[2] https://www.engadget.com/microsoft-edge-ai-copilot-184033427.html From: Kimmo Virtanen Sent: Friday, March 17, 2023 8:17 AM To: Wikimedia Mailing List Subject: [Wikimedia-l] Re: Bing-ChatGPT Hi, The development of open-source large language models is going forward.

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-17 Thread The Cunctator
This is an important development for editors to be aware of - we're going to have to be increasingly on the lookout for sources using ML-generated bullshit. Here are two instances I'm aware of this week: https://www.thenation.com/article/culture/internet-archive-publishers-lawsuit-chatbot/ > In

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-17 Thread Kimmo Virtanen
Hi, The development of open-source large language models is going forward. The GPT-4 was released and it seems that it passed the Bar exam and tried to hire humans to solve catchpas which were too complex. However, the development in the open source and hacking side has been pretty fast and it

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-05 Thread Steven Walling
On Sun, Mar 5, 2023 at 8:39 PM Luis (lu.is) wrote: > On Feb 22, 2023 at 9:28 AM -0800, Sage Ross , > wrote: > > Luis, > > OpenAI researchers have released some info about data sources that > trained GPT-3 (and hence ChatGPT): https://arxiv.org/abs/2005.14165 > > See section 2.2, starting on page

[Wikimedia-l] Re: Bing-ChatGPT

2023-03-05 Thread Luis (lu.is)
On Feb 22, 2023 at 9:28 AM -0800, Sage Ross , wrote: > Luis, > > OpenAI researchers have released some info about data sources that > trained GPT-3 (and hence ChatGPT): https://arxiv.org/abs/2005.14165 > > See section 2.2, starting on page 8 of the PDF. > > The full text of English Wikipedia is

[Wikimedia-l] Re: Bing-ChatGPT

2023-02-23 Thread Erik Moeller
On Mon, Feb 20, 2023 at 12:33 PM Jimmy Wales wrote: > Speaking only for myself, out of curiosity, some real world examples might be > helpful here. I don't have access to Bing's > version yet, but I do have access to chat.openai.com which is very impressive > but deeply flawed. I've found

[Wikimedia-l] Re: Bing-ChatGPT

2023-02-22 Thread Anders Wennersten
I got the impression from the tech editor who I read the article from, that there is a big difference in how ChatGPT is used together with Bing. Jimmy Wales here describes my own experience using only ChatGPT, if you ask "who is NN#, you get unusable rubbish back. But when the techeditor

[Wikimedia-l] Re: Bing-ChatGPT

2023-02-22 Thread Sage Ross
Luis, OpenAI researchers have released some info about data sources that trained GPT-3 (and hence ChatGPT): https://arxiv.org/abs/2005.14165 See section 2.2, starting on page 8 of the PDF. The full text of English Wikipedia is one of five sources, the others being CommonCrawl, a smaller subset

[Wikimedia-l] Re: Bing-ChatGPT

2023-02-22 Thread Luis (lu.is)
Anders, do you have a citation for “use Wikipedia content considerably”? Lots of early-ish ML work was heavily dependent on Wikipedia, but state-of-the-art Large Language Models are trained on vast quantities of text, of which Wikipedia is only a small part. ChatGPT does not share their data

[Wikimedia-l] Re: Bing-ChatGPT

2023-02-21 Thread Ali Kia
Hi. Thanks a lot. در تاریخ سه‌شنبه ۲۱ فوریهٔ ۲۰۲۳،‏ ۱:۳۰ Eduardo Testart نوشت: > Hi again, > > Another potentially interesting podcast for some touching this matter > (more or less): > https://www.nytimes.com/2023/02/17/podcasts/hard-fork-bing-ai-elon.html > > Linked to the ones I sent before

[Wikimedia-l] Re: Bing-ChatGPT

2023-02-20 Thread Eduardo Testart
Hi again, Another potentially interesting podcast for some touching this matter (more or less): https://www.nytimes.com/2023/02/17/podcasts/hard-fork-bing-ai-elon.html Linked to the ones I sent before on the other thread. If this is the new Napster revolution equivalent, yeah I know... back in

[Wikimedia-l] Re: Bing-ChatGPT

2023-02-20 Thread Jimmy Wales
Speaking only for myself, out of curiosity, some real world examples might be helpful here.   I don't have access to Bing's version yet, but I do have access to chat.openai.com which is very impressive but deeply flawed. I asked "Who is Kate Garvey?" (my wife, known a bit to the media, but

[Wikimedia-l] Re: Bing-ChatGPT

2023-02-20 Thread Kim Bruning via Wikimedia-l
FWIW YMMV, Executive Summary: == * I looked into Stable Diffusion recently. BEWARE: The actual technical and legal situation on the ground with these systems is VERY different from what -say- twitter will lead you to believe. Also :Everything you know will be wrong and out