ChatGPT provided an R code with error when I asked a question related to prediction of a new data point for linear mixed effects model. So, it is not immune to coding mistakes!
Ravi ________________________________ From: R-devel <r-devel-boun...@r-project.org> on behalf of Boris Steipe <boris.ste...@utoronto.ca> Sent: Monday, February 13, 2023 10:54 AM To: Duncan Murdoch <murdoch.dun...@gmail.com> Cc: r-devel <r-devel@r-project.org> Subject: Re: [Rd] An interesting chat with ChatGPT External Email - Use Caution Duncan - Indeed, this has now been well documented; I have called these constructions "Schr�dinger Facts", since they arise from a superposition of truths in the training data that collapse into an untruth when observed. https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsentientsyllabus.substack.com%2Fp%2Fchatgpts-achilles-heel&data=05%7C01%7Cravi.varadhan%40jhu.edu%7C1a6aecbbe3974b52595608db0ddb3271%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638119007350695881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WM23AlyRohiQh5EqTqkTI2B%2BXTG38k%2BtfaRv7KQwybw%3D&reserved=0 Now, the curious question is: why can it program. Why is its production of R-syntax less vulnerable to this effect than a literature quote, or reference? Maybe it has to do with the fact that in producing correct syntax there is a ground truth - errors just will not run. But I'm not sure. Regardless: it signals that we need a reinvigorated culture of validation. Actually ChatGPT will happily make test data for you. Whisvh. in a way, it judt did ;-) Boris > On 2023-02-13, at 08:37, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > > I was wondering which anchor would match in the regular expression "(a)|(b)" > if both "a" and "b" matches the target string. What appears to happen is > that the first match wins, and that's how it is documented in Python, but I > was looking for docs on this in R. > > As far as I can see, the ?regex help page doesn't resolve this. So just for > fun, I decided to ask ChatGPT, and got an interesting response. I won't show > the whole conversation here, but just some extracts: > > --------------------------------------- > Me: In a regular expression in R, if both "a" and "b" match, which does > "(a)|(b)" match? Please include authoritative references in your answer. > > ChatGPT: [long answer about grep] > > Me: In grep(), it doesn't matter which of a or b matches. It's only in > sub() or gsub() that it makes a difference. Do you know an authoritative > reference that discusses those functions? > > ChatGPT: [long irrelevant answer about sub and gsub, with a reference to "R > Cookbook" by P. Teetor.] > > Me: Teetor's book is not authoritative, and the other reference doesn't > discuss ambiguous alternation. > > ... finally we get to the interesting part ... > > ChatGPT: I apologize for the mistake in my previous answer. > > A more authoritative reference for the behavior of regular expressions in R > can be found in the official R documentation. > > Regarding the behavior of ambiguous alternation, the regex section of the > manual "An Introduction to R" explains it as follows: > > "When several alternatives are possible, the first alternative is chosen, so > it is important to specify alternatives in the order in which they should be > tried." > > Here is the reference for the official R documentation: > > "An Introduction to R" manual: > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcran.r-project.org%2Fdoc%2Fmanuals%2Fr-release%2FR-intro.html%23Regular-expression-facilities&data=05%7C01%7Cravi.varadhan%40jhu.edu%7C1a6aecbbe3974b52595608db0ddb3271%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638119007350695881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6MFKiZJ0nyDLmEDLcVxg%2Bm%2BYTe5lQixS2NIdZNCMecE%3D&reserved=0 > > I hope this reference provides the information you were looking for. > --------------------------------------- > > > All right, an authoritative answer! The only problem is that *it doesn't > exist*. There's no such anchor in that document. R-intro isn't a very > rapidly changing document, so I could look at every change recorded in svn, > and that section has *never* existed. > > I guess ChatGPT is a pattern matching machine, and if such a question had an > answer, it would probably be in a location like that. (I would have guessed > a help page like ?regex instead of R-intro, but I had just told it that > answer was wrong.) > > Duncan Murdoch > > ______________________________________________ > R-devel@r-project.org mailing list > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C01%7Cravi.varadhan%40jhu.edu%7C1a6aecbbe3974b52595608db0ddb3271%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638119007350695881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tcAowvYXOiEUYTN0K40pcLcag7YenH1cDa9%2FtLnOn6c%3D&reserved=0 -- Boris Steipe MD, PhD Professor em. Department of Biochemistry Temerty Faculty of Medicine University of Toronto ______________________________________________ R-devel@r-project.org mailing list https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C01%7Cravi.varadhan%40jhu.edu%7C1a6aecbbe3974b52595608db0ddb3271%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638119007350695881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tcAowvYXOiEUYTN0K40pcLcag7YenH1cDa9%2FtLnOn6c%3D&reserved=0 [[alternative HTML version deleted]]
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel