Re: [R] Amazing AI

2022-12-19 Thread avi.e.gross
Boris,

What you are telling us is not particularly new or spectacular in a sense.
It has often been hard to grade assignments students do when they choose an
unexpected path. I had one instructor who always graded my exams (in the
multiple courses I took with him) because unlike most of the sheep, I did
not bother memorizing the way something was done or proven and created my
own solutions on the fly during the test, often in ways he had to work hard
at following and that almost amused him as mine tended to be correct, albeit
not what he would have thought of.

Your issue is not particularly about R as similar scenarios can be found in
many languages and environments.

In programming, it is arguably worse as it is common to be able to do things
so many ways. But plain old R has so many packages available, often with
source code, that any student who finds one that does something they want,
may be able to copy and modify some of the functions involved into their own
code already and fool an evaluator into thinking they did it on their own.
That is a tad harder, as many packages improve the code efficiency by
(re)writing many parts in C/C++.

I have seen things like a GUI that lets you click on various check boxes and
other controls and then use those instructions to read in data files, do
various operations on them, and provide output. Some allow quite a bit of
functionality and also offer you the opportunity to see the R code it
generates, and let you adjust that if it does not quite meet your needs.
Much of it involves including various packages and calling functions in
them, but if your students are allowed to use such things, how would you
know how little actual work they did?

I echo what someone else wrote. Training students for their future jobs in
an uncertain and changing future, may be more effective in teaching them how
to find ever better or different ways to get things done, or even switch to
growing areas near their field.  All kinds of automation of jobs are
happening and will continue to happen in the knowledge professions. People
who read manuals cover to cover and keep consulting them constantly are a
rarity. Many people often first do web searches or consult experts including
online versions of the documentation. Many will happily use software that
lets them do more and more with fewer lines of code written by them and
especially when that software has been used and tested long enough to be
relatively free of bugs when used as directed. Why would anyone these days
want to constantly re-invent the wheel and write routines to read in data
from files using known formats when you can use ones that exist and, if
needed in special cases, make some tweaks such as converting a column it
made into integer, back into the floating point you want for some later
reason?

But if your students are using something that is error-prone when used the
way they are using it, that is a problem as they are not only not learning
some basics or techniques you want them to know, but relying on what may be
bad tools without taking the time and effort to check the result or make
their own tweaks. Such software may not provide a way to do something like
treat multiple entries of various kinds as being NA, as an example. So you
would need your own code to check the result between some steps and do your
own further conversions so that "." and "" and "NA" and "-" all become NA,
again, just a made up example.

Yes, some students will easily fool you when grading but that already
happens when someone hires out getting some work done and claims it as their
own.

-Original Message-
From: R-help  On Behalf Of Boris Steipe
Sent: Monday, December 19, 2022 3:16 PM
To: Milan Glacier 
Cc: r-help@r-project.org
Subject: Re: [R] Amazing AI

Exactly. But not just "error prone", rather: eloquently and confidently
incorrect. And that in itself is a problem. When I evaluate students' work,
I implicitly do so from a mental model of the student - aptitude, ability,
experience, language skills etc. That's useful for summative assessment,
since it helps efficiency - but that won't work anymore. I see a need to
assess much more carefully, require fine-grained referencing, check every
single fact ... and that won't scale. And then there is also the spectre of
having to decide when this crosses the line to "concoction" - i.e. an actual
academic offence ...

Best,
Boris



> On 2022-12-19, at 03:58, Milan Glacier  wrote:
> 
> [You don't often get email from n...@milanglacier.com. Learn why this 
> is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On 12/18/22 19:01, Boris Steipe wrote:
>> Technically not a help question. But crucial to be aware of, especially
for those of us in academia, or otherwise teaching R. I am not aware of a
suitable alternate forum. If 

Re: [R] Amazing AI

2022-12-19 Thread Christopher W. Ryan via R-help
In clinical medicine, the question the patient asks rarely represents
their main concern. Most of what I've done in my career, and most of
what I've taught, is about how to have the back-and-forth dynamic dialoq
with the patient, to help them formulate what's really on their mind,
and make sure I understand it before proceeding.

Seems to me that statistical consulting, or working IT in an
organization where one is serving "internal" customers, is similar.
Students need to learn the skills of clarifying, verifying, and
paraphrasing a client's needs.

Sure, ChatGPT may be able to generate decent code to do what a client
*says* they want to do, but is that *really* what they are looking for?

So don't retire yet :)

--Chris Ryan

Boris Steipe wrote:
> Technically not a help question. But crucial to be aware of, especially for 
> those of us in academia, or otherwise teaching R. I am not aware of a 
> suitable alternate forum. If this does not interest you, please simply ignore 
> - I already know that this may be somewhat OT.
> 
> Thanks.
> --
> 
> You very likely have heard of ChatGPT, the conversation interface on top of 
> the GPT-3 large language model and that it can generate code. I thought it 
> doesn't do R - I was wrong. Here is a little experiment:
> 
> MY PROMPT:
> ==
> Please write me some R code that will read a tsv file, remove all rows that 
> have duplicates in the column "names", and store the median values of the 
> remaining rows in a variable called "Medians". Of course, that calculation 
> should only be done with columns that are numeric. Print the last variable in 
> order from largest to smallest.
> 
> 
> AI RESPONSE:
> 
> 
> Here is some R code that should accomplish what you've described:
> 
> (Note: the code is formatted in a little terminal window, complete with a 
> button to "copy code" at the top.)
> 
> 
> # Read the tsv file
> data <- read.table("file.tsv", header = TRUE, sep = "\t")
> 
> # Remove rows with duplicates in the "names" column
> data <- data[!duplicated(data$names), ]
> 
> # Select only the numeric columns
> num_columns <- sapply(data, is.numeric)
> numeric_data <- data[, num_columns]
> 
> # Calculate the medians for each column
> Medians <- apply(numeric_data, 1, median)
> 
> # Sort Medians in descending order
> Medians <- sort(Medians, decreasing = TRUE)
> 
> # Print Medians
> print(Medians)
> 
> 
> This code assumes that the tsv file is called "file.tsv" and is stored in the 
> current working directory. It also assumes that the first row of the file 
> contains the column names.
> 
> 
> 
> MY PROMPT:
> ==
> Can you provide me with a bit of sample data that I can use to test this code?
> 
> AI RESPONSE:
> 
> Certainly! Here is some sample data that you can use to test the code I 
> provided:
> 
> 
> name  age height  weight
> Alice 23  5.6 120
> Bob   32  6.2 180
> Charlie   29  5.9 160
> Alice 23  5.6 120
> David 25  5.8 140
> Eve   27  5.5 110
> 
> 
> You can save this data as a tsv file and use it to test the code.
> 
> 
> --
> 
> Notes: 
> 1) The code, as written does not work with the test data. The reason is that 
> the test data specifies a column "name", but the code assumes "names" (which 
> I specified in the instructions). Once this is fixed in the test data, the 
> code executes.
> 
> 2) The instructions are not accurately translated: I had asked for row means. 
> But changing the margin in the apply() statement from 2 to 1 does the correct 
> thing.
> 
> 3) Note that the test data contains both numeric and non-numeric columns. 
> Also, the name "Alice" is duplicated in the test data, which is subtle, and 
> the right thing to do.
> 
> 
> --
> 
> A similar experiment, run on the beta-site of OpenAI's Codex code-completion 
> tool gives the following code (complete with the comments.):
> 
> 
> # Read in the data
> data <- read.table("mydata.tsv", sep = "\t", header = TRUE)
> 
> # Remove all rows that have duplicates in the column "names"
> unique_names <- unique(data$names)
> data_unique <- data[data$names %in% unique_names,]
> 
> # Calculate medians 
> Medians <- sapply(data_unique[, sapply(data_unique, is.numeric)], median, 
> na.rm = TRUE)
> 
> # Print last variable in order form largest to smallest
> rev(sort(Medians[length(Medians)]))
> 
> 
> Note that the strategy is quite different (e.g using %in%, not duplicated() 
> ), the interpretation of "last variable" is technically correct but not what 
> I had in mind (ChatGPT got that right though).
> 
> 
> Changing my prompts slightly resulted it going for a dplyr solution instead, 
> complete with %>% idioms etc ... again, syntactically correct but not giving 
> me the fully correct results. 
> 
> --

Re: [R] Amazing AI

2022-12-19 Thread Boris Steipe
Exactly. But not just "error prone", rather: eloquently and confidently 
incorrect. And that in itself is a problem. When I evaluate students' work, I 
implicitly do so from a mental model of the student - aptitude, ability, 
experience, language skills etc. That's useful for summative assessment, since 
it helps efficiency - but that won't work anymore. I see a need to assess much 
more carefully, require fine-grained referencing, check every single fact ... 
and that won't scale. And then there is also the spectre of having to decide 
when this crosses the line to "concoction" - i.e. an actual academic offence ...

Best,
Boris



> On 2022-12-19, at 03:58, Milan Glacier  wrote:
> 
> [You don't often get email from n...@milanglacier.com. Learn why this is 
> important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On 12/18/22 19:01, Boris Steipe wrote:
>> Technically not a help question. But crucial to be aware of, especially for 
>> those of us in academia, or otherwise teaching R. I am not aware of a 
>> suitable alternate forum. If this does not interest you, please simply 
>> ignore - I already know that this may be somewhat OT.
>> 
>> Thanks.
>> --
>> 
>> You very likely have heard of ChatGPT, the conversation interface on top of 
>> the GPT-3 large language model and that it can generate code. I thought it 
>> doesn't do R - I was wrong. Here is a little experiment:
>> Note that the strategy is quite different (e.g using %in%, not duplicated() 
>> ), the interpretation of "last variable" is technically correct but not what 
>> I had in mind (ChatGPT got that right though).
>> 
>> 
>> Changing my prompts slightly resulted it going for a dplyr solution instead, 
>> complete with %>% idioms etc ... again, syntactically correct but not giving 
>> me the fully correct results.
>> 
>> --
>> 
>> Bottom line: The AI's ability to translate natural language instructions 
>> into code is astounding. Errors the AI makes are subtle and probably not 
>> easy to fix if you don't already know what you are doing. But the way that 
>> this can be "confidently incorrect" and plausible makes it nearly impossible 
>> to detect unless you actually run the code (you may have noticed that when 
>> you read the code).
>> 
>> Will our students use it? Absolutely.
>> 
>> Will they successfully cheat with it? That depends on the assignment. We 
>> probably need to _encourage_ them to use it rather than sanction - but 
>> require them to attribute the AI, document prompts, and identify their own, 
>> additional contributions.
>> 
>> Will it help them learn? When you are aware of the issues, it may be quite 
>> useful. It may be especially useful to teach them to specify their code 
>> carefully and completely, and to ask questions in the right way. Test cases 
>> are crucial.
>> 
>> How will it affect what we do as instructors? I don't know. Really.
>> 
>> And the future? I am not pleased to extrapolate to a job market in which 
>> they compete with knowledge workers who work 24/7 without benefits, vacation 
>> pay, or even a salary. They'll need to rethink the value of their investment 
>> in an academic education. We'll need to rethink what we do to provide value 
>> above and beyond what AI's can do. (Nb. all of the arguments I hear about 
>> why humans will always be better etc. are easily debunked, but that's even 
>> more OT :-)
>> 
>> 
>> 
>> If you have thoughts to share how your institution is thinking about 
>> academic integrity in this situation, or creative ideas how to integrate 
>> this into teaching, I'd love to hear from you.
> 
> *NEVER* let the AI misleading the students! ChatGPT gives you seemingly
> sound but actually *wrong* code!
> 
> ChatGPT never understands the formal abstraction behind the code, it
> just understands the shallow text pattern (and the syntax rules) in the
> code. And it often gives you the code that seemingly correct but indeed
> wrongly output. If it is used with code completion, then it is okay
> (just like github copilot), since the coder need to modify the code
> after getting the completion. But if you want to use ChatGPT for
> students to query information / writing code, it is error proning!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amazing AI

2022-12-19 Thread Boris Steipe
Yes ... but that "1" is my correction to the AI's suggested "2".
Sorry if that was confusing.

:-)



> On 2022-12-19, at 14:10, John Kane  wrote:
> 
> Does not Medians <- apply(numeric_data, 1, median) give us the rom medians?
> 
> On Mon, 19 Dec 2022 at 05:52, Milan Glacier  wrote:
> On 12/18/22 19:01, Boris Steipe wrote:
> >Technically not a help question. But crucial to be aware of, especially for 
> >those of us in academia, or otherwise teaching R. I am not aware of a 
> >suitable alternate forum. If this does not interest you, please simply 
> >ignore - I already know that this may be somewhat OT.
> >
> >Thanks.
> >--
> >
> >You very likely have heard of ChatGPT, the conversation interface on top of 
> >the GPT-3 large language model and that it can generate code. I thought it 
> >doesn't do R - I was wrong. Here is a little experiment:
> >Note that the strategy is quite different (e.g using %in%, not duplicated() 
> >), the interpretation of "last variable" is technically correct but not what 
> >I had in mind (ChatGPT got that right though).
> >
> >
> >Changing my prompts slightly resulted it going for a dplyr solution instead, 
> >complete with %>% idioms etc ... again, syntactically correct but not giving 
> >me the fully correct results.
> >
> >--
> >
> >Bottom line: The AI's ability to translate natural language instructions 
> >into code is astounding. Errors the AI makes are subtle and probably not 
> >easy to fix if you don't already know what you are doing. But the way that 
> >this can be "confidently incorrect" and plausible makes it nearly impossible 
> >to detect unless you actually run the code (you may have noticed that when 
> >you read the code).
> >
> >Will our students use it? Absolutely.
> >
> >Will they successfully cheat with it? That depends on the assignment. We 
> >probably need to _encourage_ them to use it rather than sanction - but 
> >require them to attribute the AI, document prompts, and identify their own, 
> >additional contributions.
> >
> >Will it help them learn? When you are aware of the issues, it may be quite 
> >useful. It may be especially useful to teach them to specify their code 
> >carefully and completely, and to ask questions in the right way. Test cases 
> >are crucial.
> >
> >How will it affect what we do as instructors? I don't know. Really.
> >
> >And the future? I am not pleased to extrapolate to a job market in which 
> >they compete with knowledge workers who work 24/7 without benefits, vacation 
> >pay, or even a salary. They'll need to rethink the value of their investment 
> >in an academic education. We'll need to rethink what we do to provide value 
> >above and beyond what AI's can do. (Nb. all of the arguments I hear about 
> >why humans will always be better etc. are easily debunked, but that's even 
> >more OT :-)
> >
> >
> >
> >If you have thoughts to share how your institution is thinking about 
> >academic integrity in this situation, or creative ideas how to integrate 
> >this into teaching, I'd love to hear from you.
> 
> *NEVER* let the AI misleading the students! ChatGPT gives you seemingly
> sound but actually *wrong* code!
> 
> ChatGPT never understands the formal abstraction behind the code, it
> just understands the shallow text pattern (and the syntax rules) in the
> code. And it often gives you the code that seemingly correct but indeed
> wrongly output. If it is used with code completion, then it is okay
> (just like github copilot), since the coder need to modify the code
> after getting the completion. But if you want to use ChatGPT for
> students to query information / writing code, it is error proning!
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> -- 
> John Kane
> Kingston ON Canada

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amazing AI

2022-12-19 Thread John Kane
Does not Medians <- apply(numeric_data, 1, median) give us the rom medians?

On Mon, 19 Dec 2022 at 05:52, Milan Glacier  wrote:

> On 12/18/22 19:01, Boris Steipe wrote:
> >Technically not a help question. But crucial to be aware of, especially
> for those of us in academia, or otherwise teaching R. I am not aware of a
> suitable alternate forum. If this does not interest you, please simply
> ignore - I already know that this may be somewhat OT.
> >
> >Thanks.
> >--
> >
> >You very likely have heard of ChatGPT, the conversation interface on top
> of the GPT-3 large language model and that it can generate code. I thought
> it doesn't do R - I was wrong. Here is a little experiment:
> >Note that the strategy is quite different (e.g using %in%, not
> duplicated() ), the interpretation of "last variable" is technically
> correct but not what I had in mind (ChatGPT got that right though).
> >
> >
> >Changing my prompts slightly resulted it going for a dplyr solution
> instead, complete with %>% idioms etc ... again, syntactically correct but
> not giving me the fully correct results.
> >
> >--
> >
> >Bottom line: The AI's ability to translate natural language instructions
> into code is astounding. Errors the AI makes are subtle and probably not
> easy to fix if you don't already know what you are doing. But the way that
> this can be "confidently incorrect" and plausible makes it nearly
> impossible to detect unless you actually run the code (you may have noticed
> that when you read the code).
> >
> >Will our students use it? Absolutely.
> >
> >Will they successfully cheat with it? That depends on the assignment. We
> probably need to _encourage_ them to use it rather than sanction - but
> require them to attribute the AI, document prompts, and identify their own,
> additional contributions.
> >
> >Will it help them learn? When you are aware of the issues, it may be
> quite useful. It may be especially useful to teach them to specify their
> code carefully and completely, and to ask questions in the right way. Test
> cases are crucial.
> >
> >How will it affect what we do as instructors? I don't know. Really.
> >
> >And the future? I am not pleased to extrapolate to a job market in which
> they compete with knowledge workers who work 24/7 without benefits,
> vacation pay, or even a salary. They'll need to rethink the value of their
> investment in an academic education. We'll need to rethink what we do to
> provide value above and beyond what AI's can do. (Nb. all of the arguments
> I hear about why humans will always be better etc. are easily debunked, but
> that's even more OT :-)
> >
> >
> >
> >If you have thoughts to share how your institution is thinking about
> academic integrity in this situation, or creative ideas how to integrate
> this into teaching, I'd love to hear from you.
>
> *NEVER* let the AI misleading the students! ChatGPT gives you seemingly
> sound but actually *wrong* code!
>
> ChatGPT never understands the formal abstraction behind the code, it
> just understands the shallow text pattern (and the syntax rules) in the
> code. And it often gives you the code that seemingly correct but indeed
> wrongly output. If it is used with code completion, then it is okay
> (just like github copilot), since the coder need to modify the code
> after getting the completion. But if you want to use ChatGPT for
> students to query information / writing code, it is error proning!
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
John Kane
Kingston ON Canada

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amazing AI

2022-12-19 Thread Milan Glacier

On 12/18/22 19:01, Boris Steipe wrote:

Technically not a help question. But crucial to be aware of, especially for 
those of us in academia, or otherwise teaching R. I am not aware of a suitable 
alternate forum. If this does not interest you, please simply ignore - I 
already know that this may be somewhat OT.

Thanks.
--

You very likely have heard of ChatGPT, the conversation interface on top of the 
GPT-3 large language model and that it can generate code. I thought it doesn't 
do R - I was wrong. Here is a little experiment:
Note that the strategy is quite different (e.g using %in%, not duplicated() ), the 
interpretation of "last variable" is technically correct but not what I had in 
mind (ChatGPT got that right though).


Changing my prompts slightly resulted it going for a dplyr solution instead, 
complete with %>% idioms etc ... again, syntactically correct but not giving me 
the fully correct results.

--

Bottom line: The AI's ability to translate natural language instructions into code is 
astounding. Errors the AI makes are subtle and probably not easy to fix if you don't 
already know what you are doing. But the way that this can be "confidently 
incorrect" and plausible makes it nearly impossible to detect unless you actually 
run the code (you may have noticed that when you read the code).

Will our students use it? Absolutely.

Will they successfully cheat with it? That depends on the assignment. We 
probably need to _encourage_ them to use it rather than sanction - but require 
them to attribute the AI, document prompts, and identify their own, additional 
contributions.

Will it help them learn? When you are aware of the issues, it may be quite 
useful. It may be especially useful to teach them to specify their code 
carefully and completely, and to ask questions in the right way. Test cases are 
crucial.

How will it affect what we do as instructors? I don't know. Really.

And the future? I am not pleased to extrapolate to a job market in which they 
compete with knowledge workers who work 24/7 without benefits, vacation pay, or 
even a salary. They'll need to rethink the value of their investment in an 
academic education. We'll need to rethink what we do to provide value above and 
beyond what AI's can do. (Nb. all of the arguments I hear about why humans will 
always be better etc. are easily debunked, but that's even more OT :-)



If you have thoughts to share how your institution is thinking about academic 
integrity in this situation, or creative ideas how to integrate this into 
teaching, I'd love to hear from you.


*NEVER* let the AI misleading the students! ChatGPT gives you seemingly
sound but actually *wrong* code!

ChatGPT never understands the formal abstraction behind the code, it
just understands the shallow text pattern (and the syntax rules) in the
code. And it often gives you the code that seemingly correct but indeed
wrongly output. If it is used with code completion, then it is okay
(just like github copilot), since the coder need to modify the code
after getting the completion. But if you want to use ChatGPT for
students to query information / writing code, it is error proning!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Amazing AI

2022-12-19 Thread Eric Berger
Hi Boris,
I think these are good questions.
Some initial reactions:
1. with better tools available to the students, you can cover more material
at a faster pace 🙂
2. For years, it has been possible for students to find "answers" online
(e.g. Google search). Most programmers would regard this as an essential
part of their work - looking to overcome some obstacle via searching on
StackOverflow, Google, R-Help 🙂etc*. I consider the ability to do such
searches something worth teaching. In this spirit, you could teach
effective ways to use ChatGPT.
3. Finally, as you observed, some of the ChatGPT responses have
bugs/deficiencies. For example, the Codex code does *not* remove
duplicates. It requires knowledge and skill of the material you are
teaching, for the students to be able to spot and fix the bugs. In which
case, what's your problem? 😁

HTH,
Eric

* and as a shout-out to Bert, they should also read the documentation -
e.g. ?subset

On Mon, Dec 19, 2022 at 12:47 AM Ebert,Timothy Aaron  wrote:

> It would help students formulate a plan for coding. Successful students
> will be able to give good directions that the AI can turn into good code.
> This skill is essential no matter who writes the program.
> In more advanced classes I might collect some data sets designed to cause
> the AI problems. Another option is to make tests where students have to
> write code on paper or multiple guess where students must choose between
> similar code snips.
>
>
> -Original Message-
> From: R-help  On Behalf Of Boris Steipe
> Sent: Sunday, December 18, 2022 2:01 PM
> To: r-help@r-project.org
> Subject: [R] Amazing AI
>
> [External Email]
>
> Technically not a help question. But crucial to be aware of, especially
> for those of us in academia, or otherwise teaching R. I am not aware of a
> suitable alternate forum. If this does not interest you, please simply
> ignore - I already know that this may be somewhat OT.
>
> Thanks.
> --
>
> You very likely have heard of ChatGPT, the conversation interface on top
> of the GPT-3 large language model and that it can generate code. I thought
> it doesn't do R - I was wrong. Here is a little experiment:
>
> MY PROMPT:
> ==
> Please write me some R code that will read a tsv file, remove all rows
> that have duplicates in the column "names", and store the median values of
> the remaining rows in a variable called "Medians". Of course, that
> calculation should only be done with columns that are numeric. Print the
> last variable in order from largest to smallest.
>
>
> AI RESPONSE:
> 
>
> Here is some R code that should accomplish what you've described:
>
> (Note: the code is formatted in a little terminal window, complete with a
> button to "copy code" at the top.)
>
> 
> # Read the tsv file
> data <- read.table("file.tsv", header = TRUE, sep = "\t")
>
> # Remove rows with duplicates in the "names" column data <-
> data[!duplicated(data$names), ]
>
> # Select only the numeric columns
> num_columns <- sapply(data, is.numeric)
> numeric_data <- data[, num_columns]
>
> # Calculate the medians for each column
> Medians <- apply(numeric_data, 1, median)
>
> # Sort Medians in descending order
> Medians <- sort(Medians, decreasing = TRUE)
>
> # Print Medians
> print(Medians)
> 
>
> This code assumes that the tsv file is called "file.tsv" and is stored in
> the current working directory. It also assumes that the first row of the
> file contains the column names.
>
>
>
> MY PROMPT:
> ==
> Can you provide me with a bit of sample data that I can use to test this
> code?
>
> AI RESPONSE:
> 
> Certainly! Here is some sample data that you can use to test the code I
> provided:
>
> 
> nameage height  weight
> Alice   23  5.6 120
> Bob 32  6.2 180
> Charlie 29  5.9 160
> Alice   23  5.6 120
> David   25  5.8 140
> Eve 27  5.5 110
> 
>
> You can save this data as a tsv file and use it to test the code.
>
>
> --
>
> Notes:
> 1) The code, as written does not work with the test data. The reason is
> that the test data specifies a column "name", but the code assumes "names"
> (which I specified in the instructions). Once this is fixed in the test
> data, the code executes.
>
> 2) The instructions are not accurately translated: I had asked for row
> means. But changing the margin in the apply() statement from 2 to 1 does
> t

Re: [R] Amazing AI

2022-12-18 Thread Ebert,Timothy Aaron
It would help students formulate a plan for coding. Successful students will be 
able to give good directions that the AI can turn into good code. This skill is 
essential no matter who writes the program. 
In more advanced classes I might collect some data sets designed to cause the 
AI problems. Another option is to make tests where students have to write code 
on paper or multiple guess where students must choose between similar code 
snips.


-Original Message-
From: R-help  On Behalf Of Boris Steipe
Sent: Sunday, December 18, 2022 2:01 PM
To: r-help@r-project.org
Subject: [R] Amazing AI

[External Email]

Technically not a help question. But crucial to be aware of, especially for 
those of us in academia, or otherwise teaching R. I am not aware of a suitable 
alternate forum. If this does not interest you, please simply ignore - I 
already know that this may be somewhat OT.

Thanks.
--

You very likely have heard of ChatGPT, the conversation interface on top of the 
GPT-3 large language model and that it can generate code. I thought it doesn't 
do R - I was wrong. Here is a little experiment:

MY PROMPT:
==
Please write me some R code that will read a tsv file, remove all rows that 
have duplicates in the column "names", and store the median values of the 
remaining rows in a variable called "Medians". Of course, that calculation 
should only be done with columns that are numeric. Print the last variable in 
order from largest to smallest.


AI RESPONSE:


Here is some R code that should accomplish what you've described:

(Note: the code is formatted in a little terminal window, complete with a 
button to "copy code" at the top.)


# Read the tsv file
data <- read.table("file.tsv", header = TRUE, sep = "\t")

# Remove rows with duplicates in the "names" column data <- 
data[!duplicated(data$names), ]

# Select only the numeric columns
num_columns <- sapply(data, is.numeric)
numeric_data <- data[, num_columns]

# Calculate the medians for each column
Medians <- apply(numeric_data, 1, median)

# Sort Medians in descending order
Medians <- sort(Medians, decreasing = TRUE)

# Print Medians
print(Medians)


This code assumes that the tsv file is called "file.tsv" and is stored in the 
current working directory. It also assumes that the first row of the file 
contains the column names.



MY PROMPT:
==
Can you provide me with a bit of sample data that I can use to test this code?

AI RESPONSE:

Certainly! Here is some sample data that you can use to test the code I 
provided:


nameage height  weight
Alice   23  5.6 120
Bob 32  6.2 180
Charlie 29  5.9 160
Alice   23  5.6 120
David   25  5.8 140
Eve 27  5.5 110


You can save this data as a tsv file and use it to test the code.


--

Notes:
1) The code, as written does not work with the test data. The reason is that 
the test data specifies a column "name", but the code assumes "names" (which I 
specified in the instructions). Once this is fixed in the test data, the code 
executes.

2) The instructions are not accurately translated: I had asked for row means. 
But changing the margin in the apply() statement from 2 to 1 does the correct 
thing.

3) Note that the test data contains both numeric and non-numeric columns. Also, 
the name "Alice" is duplicated in the test data, which is subtle, and the right 
thing to do.


--

A similar experiment, run on the beta-site of OpenAI's Codex code-completion 
tool gives the following code (complete with the comments.):


# Read in the data
data <- read.table("mydata.tsv", sep = "\t", header = TRUE)

# Remove all rows that have duplicates in the column "names"
unique_names <- unique(data$names)
data_unique <- data[data$names %in% unique_names,]

# Calculate medians
Medians <- sapply(data_unique[, sapply(data_unique, is.numeric)], median, na.rm 
= TRUE)

# Print last variable in order form largest to smallest
rev(sort(Medians[length(Medians)]))


Note that the strategy is quite different (e.g using %in%, not duplicated() ), 
the interpretation of "last variable" is technically correct but not what I had 
in mind (ChatGPT got that right though).


Changing my prompts slightly resulted it going for a dplyr solution instead, 
complete with %>% idioms etc ... again, syntactically correct but not giving me 
the fully correct results.

--

Bottom line: The AI's ability to translate natural language instructions into 
code is astounding. Errors the AI makes are subtle and probably not easy to fix 

[R] Amazing AI

2022-12-18 Thread Boris Steipe
Technically not a help question. But crucial to be aware of, especially for 
those of us in academia, or otherwise teaching R. I am not aware of a suitable 
alternate forum. If this does not interest you, please simply ignore - I 
already know that this may be somewhat OT.

Thanks.
--

You very likely have heard of ChatGPT, the conversation interface on top of the 
GPT-3 large language model and that it can generate code. I thought it doesn't 
do R - I was wrong. Here is a little experiment:

MY PROMPT:
==
Please write me some R code that will read a tsv file, remove all rows that 
have duplicates in the column "names", and store the median values of the 
remaining rows in a variable called "Medians". Of course, that calculation 
should only be done with columns that are numeric. Print the last variable in 
order from largest to smallest.


AI RESPONSE:


Here is some R code that should accomplish what you've described:

(Note: the code is formatted in a little terminal window, complete with a 
button to "copy code" at the top.)


# Read the tsv file
data <- read.table("file.tsv", header = TRUE, sep = "\t")

# Remove rows with duplicates in the "names" column
data <- data[!duplicated(data$names), ]

# Select only the numeric columns
num_columns <- sapply(data, is.numeric)
numeric_data <- data[, num_columns]

# Calculate the medians for each column
Medians <- apply(numeric_data, 1, median)

# Sort Medians in descending order
Medians <- sort(Medians, decreasing = TRUE)

# Print Medians
print(Medians)


This code assumes that the tsv file is called "file.tsv" and is stored in the 
current working directory. It also assumes that the first row of the file 
contains the column names.



MY PROMPT:
==
Can you provide me with a bit of sample data that I can use to test this code?

AI RESPONSE:

Certainly! Here is some sample data that you can use to test the code I 
provided:


nameage height  weight
Alice   23  5.6 120
Bob 32  6.2 180
Charlie 29  5.9 160
Alice   23  5.6 120
David   25  5.8 140
Eve 27  5.5 110


You can save this data as a tsv file and use it to test the code.


--

Notes: 
1) The code, as written does not work with the test data. The reason is that 
the test data specifies a column "name", but the code assumes "names" (which I 
specified in the instructions). Once this is fixed in the test data, the code 
executes.

2) The instructions are not accurately translated: I had asked for row means. 
But changing the margin in the apply() statement from 2 to 1 does the correct 
thing.

3) Note that the test data contains both numeric and non-numeric columns. Also, 
the name "Alice" is duplicated in the test data, which is subtle, and the right 
thing to do.


--

A similar experiment, run on the beta-site of OpenAI's Codex code-completion 
tool gives the following code (complete with the comments.):


# Read in the data
data <- read.table("mydata.tsv", sep = "\t", header = TRUE)

# Remove all rows that have duplicates in the column "names"
unique_names <- unique(data$names)
data_unique <- data[data$names %in% unique_names,]

# Calculate medians 
Medians <- sapply(data_unique[, sapply(data_unique, is.numeric)], median, na.rm 
= TRUE)

# Print last variable in order form largest to smallest
rev(sort(Medians[length(Medians)]))


Note that the strategy is quite different (e.g using %in%, not duplicated() ), 
the interpretation of "last variable" is technically correct but not what I had 
in mind (ChatGPT got that right though).


Changing my prompts slightly resulted it going for a dplyr solution instead, 
complete with %>% idioms etc ... again, syntactically correct but not giving me 
the fully correct results. 

--

Bottom line: The AI's ability to translate natural language instructions into 
code is astounding. Errors the AI makes are subtle and probably not easy to fix 
if you don't already know what you are doing. But the way that this can be 
"confidently incorrect" and plausible makes it nearly impossible to detect 
unless you actually run the code (you may have noticed that when you read the 
code).

Will our students use it? Absolutely.

Will they successfully cheat with it? That depends on the assignment. We 
probably need to _encourage_ them to use it rather than sanction - but require 
them to attribute the AI, document prompts, and identify their own, additional 
contributions.

Will it help them learn? When you are aware of the issues, it may be quite 
useful. It may be especially useful to teach them to specify their code 
carefully and completely, and to ask questions in the right way. Test cases are 
crucial.

How will it affect what we do as instructors? I do