Re: [R] 'R' Software Output Plagiarism
On 23 Sep 2015, at 02:33 , Duncan Murdochwrote: > I don't see why this puzzles you. A simple explanation is that Urkund > is incompetent. That much I figured. What I was puzzled about was _how_ it was being incompetent. Also how it could be so in a way that wouldn't be obvious to the professor in question. I haven't used Urkund, but people are pushing it here at CBS too. If they start automating it on electronic submissions in maths and stats, some interesting things could happen. I have used iThenticate as a journal editor, and that one will tell you word for word which sections of text are identical to which sections of text in which other documents. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'R' Software Output Plagiarism
Your professor should immediately recognize that the quoted code is standard regression input/output and that the Urkund results in this case are without merit. > On Sep 22, 2015, at 7:27 AM, BARRETT, Oliverwrote: > > > Dear 'R' community support, > > > I am a student at Skema business school and I have recently submitted my MSc > thesis/dissertation. This has been passed on to an external plagiarism > service provider, Urkund, who have scanned my document and returned a > plagiarism report to my professor having detected 32% plagiarism. > > > I have contacted Urkund regarding this issue having committed no such > plagiarism and they have told me that all the plagiarism detected in my > document comes from the last 25% which consists only of 'R' regressions like > the one I have pasted below: > > lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + >Fed.t.4., data = OLS_CAR, x = TRUE) > > Residuals: > Min1QMedian3Q Max > -0.154587 -0.015961 0.001429 0.017196 0.110907 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.001630 0.001763 -0.925 0.3559 > Fed -0.121595 0.165359 -0.735 0.4627 > Fed.t.1. 0.344014 0.140979 2.440 0.0153 * > Fed.t.2. 0.026529 0.143648 0.185 0.8536 > Fed.t.3. 0.622357 0.142021 4.382 1.62e-05 *** > Fed.t.4. 0.291985 0.158914 1.837 0.0671 . > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 0.0293 on 304 degrees of freedom > (20 observations deleted due to missingness) > Multiple R-squared: 0.08629, Adjusted R-squared: 0.07126 > F-statistic: 5.742 on 5 and 304 DF, p-value: 4.422e-05 > > I have produced all of these regressions myself and pasted them directly from > the 'R' software package. My regression methodology is entirely my own along > with the sourcing and preperation of the data used to produce these > statistics. > > I would be very grateful if you could provide my with some clarity as to why > this output from 'R' is reading as plagiarism. > > I would like to thank you in advance, > > Kind regards, > > Oliver Barrett > (+44) 7341 834 217 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'R' Software Output Plagiarism
> -Original Message- > From: pda...@gmail.com > Sent: Thu, 24 Sep 2015 11:41:19 +0200 > To: murdoch.dun...@gmail.com > Subject: Re: [R] 'R' Software Output Plagiarism > > > On 23 Sep 2015, at 02:33 , Duncan Murdoch <murdoch.dun...@gmail.com> > wrote: > >> I don't see why this puzzles you. A simple explanation is that Urkund >> is incompetent. > > That much I figured. What I was puzzled about was _how_ it was being > incompetent. Also how it could be so in a way that wouldn't be obvious to > the professor in question. My guess is that it just ran the scan and emailed back the results with no analysis. It sounds like the software is functioning 'properly' but the humanware is not. But then, there is no reason to expect them to be subject matter experts, either. One should just hope they would supply better reports than it appears Oliver and his professor received. Re the professor, he/she may have just tossed the report to Oliver and said, "Explain this". Once Oliver discussed the issue with Ukund it should be blinding obvious to the professor. Share photos & screenshots in seconds... TRY FREE IM TOOLPACK at http://www.imtoolpack.com/default.aspx?rc=if1 Works in all emails, instant messengers, blogs, forums and social networks. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'R' Software Output Plagiarism
Marc, I don't think Copyright/Intellectual property issues factor into this. Urkund and similar tools are to my knowledge entirely about plagiarism. So the issue would seem to be that the R output is considered identical or nearly indentical to R output in other published orotherwise submitted material. What puzzles me (except for how a document can be deemed 32% plagiarized in 25% of the text) is whether this includes the numbers and variable names. If those are somehow factored out, then any R regression could be pretty much identical to any other R regression. However, two analyses with similar variable names could happen if they are based on the same cookbook recipe and analyses with similar numerical output come from analyzing the same standard data. Such situations would not necessarily be considered plagiarism (I mean: If you claim that you are analyzing data from experiments that you yourself have performed, and your numbers are exactly identical to something that has been previously published, then it would be suspect. If you analyze something from public sources, someone else might well have done the same thing.). Similarly to John Kane, I think it is necessary to know exactly what sources the text is claimed to be plagiarized from and/or what parts of the text that are being matched by Urkund. If it turns out that Urkund is generating false positives, then this needs to be pointed out to them and to the people basing decisions on it. -pd > On 22 Sep 2015, at 18:24 , Marc Schwartzwrote: > > Hi, > > With the usual caveat that I Am Not A Lawyerand that I am not speaking on > behalf of any organization... > > My guess is that they are claiming that the output of R, simply being copied > and pasted verbatim into your thesis constitutes the use of copyrighted > output from the software. > > It is not clear to me that R's output is copyrighted by the R Foundation (or > by other parties for CRAN packages), albeit, the source code underlying R is, > along with other copyright owner's as apropos. There is some caselaw to > support the notion that the output alone is not protected in a similar > manner, but that may be country specific. > > Did you provide any credit to R (see the output of citation() ) in your > thesis and indicate that your analyses were performed using R? > > If R is uncredited, I could see them raising the issue. > > You might check with your institution's legal/policy folks to see if there is > any guidance provided for students regarding the crediting of software used > in this manner, especially if that guidance is at no cost to you. > > Regards, > > Marc Schwartz > > >> On Sep 22, 2015, at 11:01 AM, Bert Gunter wrote: >> >> 1. It is highly unlikely that we could be of help (unless someone else >> has experienced this and knows what happened). You will have to >> contact the Urkund people and ask them why their algorithms raised the >> flags. >> >> 2. But of course, the regression methodology is not "your own" -- it's >> just a standard tool that you used in your work, which is entirely >> legitimate of course. >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> -- Clifford Stoll >> >> >> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver >> wrote: >>> >>> Dear 'R' community support, >>> >>> >>> I am a student at Skema business school and I have recently submitted my >>> MSc thesis/dissertation. This has been passed on to an external plagiarism >>> service provider, Urkund, who have scanned my document and returned a >>> plagiarism report to my professor having detected 32% plagiarism. >>> >>> >>> I have contacted Urkund regarding this issue having committed no such >>> plagiarism and they have told me that all the plagiarism detected in my >>> document comes from the last 25% which consists only of 'R' regressions >>> like the one I have pasted below: >>> >>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + >>> Fed.t.4., data = OLS_CAR, x = TRUE) >>> >>> Residuals: >>> Min1QMedian3Q Max >>> -0.154587 -0.015961 0.001429 0.017196 0.110907 >>> >>> Coefficients: >>>Estimate Std. Error t value Pr(>|t|) >>> (Intercept) -0.001630 0.001763 -0.925 0.3559 >>> Fed -0.121595 0.165359 -0.735 0.4627 >>> Fed.t.1. 0.344014 0.140979 2.440 0.0153 * >>> Fed.t.2. 0.026529 0.143648 0.185 0.8536 >>> Fed.t.3. 0.622357 0.142021 4.382 1.62e-05 *** >>> Fed.t.4. 0.291985 0.158914 1.837 0.0671 . >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>> >>> Residual standard error: 0.0293 on 304 degrees of freedom >>> (20 observations deleted due to missingness) >>> Multiple R-squared: 0.08629, Adjusted R-squared: 0.07126 >>>
Re: [R] 'R' Software Output Plagiarism
Hi, With the usual caveat that I Am Not A Lawyerand that I am not speaking on behalf of any organization... My guess is that they are claiming that the output of R, simply being copied and pasted verbatim into your thesis constitutes the use of copyrighted output from the software. It is not clear to me that R's output is copyrighted by the R Foundation (or by other parties for CRAN packages), albeit, the source code underlying R is, along with other copyright owner's as apropos. There is some caselaw to support the notion that the output alone is not protected in a similar manner, but that may be country specific. Did you provide any credit to R (see the output of citation() ) in your thesis and indicate that your analyses were performed using R? If R is uncredited, I could see them raising the issue. You might check with your institution's legal/policy folks to see if there is any guidance provided for students regarding the crediting of software used in this manner, especially if that guidance is at no cost to you. Regards, Marc Schwartz > On Sep 22, 2015, at 11:01 AM, Bert Gunterwrote: > > 1. It is highly unlikely that we could be of help (unless someone else > has experienced this and knows what happened). You will have to > contact the Urkund people and ask them why their algorithms raised the > flags. > > 2. But of course, the regression methodology is not "your own" -- it's > just a standard tool that you used in your work, which is entirely > legitimate of course. > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver > wrote: >> >> Dear 'R' community support, >> >> >> I am a student at Skema business school and I have recently submitted my MSc >> thesis/dissertation. This has been passed on to an external plagiarism >> service provider, Urkund, who have scanned my document and returned a >> plagiarism report to my professor having detected 32% plagiarism. >> >> >> I have contacted Urkund regarding this issue having committed no such >> plagiarism and they have told me that all the plagiarism detected in my >> document comes from the last 25% which consists only of 'R' regressions like >> the one I have pasted below: >> >> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + >>Fed.t.4., data = OLS_CAR, x = TRUE) >> >> Residuals: >> Min1QMedian3Q Max >> -0.154587 -0.015961 0.001429 0.017196 0.110907 >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) -0.001630 0.001763 -0.925 0.3559 >> Fed -0.121595 0.165359 -0.735 0.4627 >> Fed.t.1. 0.344014 0.140979 2.440 0.0153 * >> Fed.t.2. 0.026529 0.143648 0.185 0.8536 >> Fed.t.3. 0.622357 0.142021 4.382 1.62e-05 *** >> Fed.t.4. 0.291985 0.158914 1.837 0.0671 . >> --- >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >> >> Residual standard error: 0.0293 on 304 degrees of freedom >> (20 observations deleted due to missingness) >> Multiple R-squared: 0.08629, Adjusted R-squared: 0.07126 >> F-statistic: 5.742 on 5 and 304 DF, p-value: 4.422e-05 >> >> I have produced all of these regressions myself and pasted them directly >> from the 'R' software package. My regression methodology is entirely my own >> along with the sourcing and preperation of the data used to produce these >> statistics. >> >> I would be very grateful if you could provide my with some clarity as to why >> this output from 'R' is reading as plagiarism. >> >> I would like to thank you in advance, >> >> Kind regards, >> >> Oliver Barrett >> (+44) 7341 834 217 >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'R' Software Output Plagiarism
Very good point about the referencing. I wonder if this is happening to users of Stata or SAS as well? John Kane Kingston ON Canada > -Original Message- > From: marc_schwa...@me.com > Sent: Tue, 22 Sep 2015 11:24:13 -0500 > To: bgunter.4...@gmail.com > Subject: Re: [R] 'R' Software Output Plagiarism > > Hi, > > With the usual caveat that I Am Not A Lawyerand that I am not > speaking on behalf of any organization... > > My guess is that they are claiming that the output of R, simply being > copied and pasted verbatim into your thesis constitutes the use of > copyrighted output from the software. > > It is not clear to me that R's output is copyrighted by the R Foundation > (or by other parties for CRAN packages), albeit, the source code > underlying R is, along with other copyright owner's as apropos. There is > some caselaw to support the notion that the output alone is not protected > in a similar manner, but that may be country specific. > > Did you provide any credit to R (see the output of citation() ) in your > thesis and indicate that your analyses were performed using R? > > If R is uncredited, I could see them raising the issue. > > You might check with your institution's legal/policy folks to see if > there is any guidance provided for students regarding the crediting of > software used in this manner, especially if that guidance is at no cost > to you. > > Regards, > > Marc Schwartz > > >> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4...@gmail.com> >> wrote: >> >> 1. It is highly unlikely that we could be of help (unless someone else >> has experienced this and knows what happened). You will have to >> contact the Urkund people and ask them why their algorithms raised the >> flags. >> >> 2. But of course, the regression methodology is not "your own" -- it's >> just a standard tool that you used in your work, which is entirely >> legitimate of course. >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> -- Clifford Stoll >> >> >> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver >> <oliver.barr...@skema.edu> wrote: >>> >>> Dear 'R' community support, >>> >>> >>> I am a student at Skema business school and I have recently submitted >>> my MSc thesis/dissertation. This has been passed on to an external >>> plagiarism service provider, Urkund, who have scanned my document and >>> returned a plagiarism report to my professor having detected 32% >>> plagiarism. >>> >>> >>> I have contacted Urkund regarding this issue having committed no such >>> plagiarism and they have told me that all the plagiarism detected in my >>> document comes from the last 25% which consists only of 'R' regressions >>> like the one I have pasted below: >>> >>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + >>>Fed.t.4., data = OLS_CAR, x = TRUE) >>> >>> Residuals: >>> Min1QMedian3Q Max >>> -0.154587 -0.015961 0.001429 0.017196 0.110907 >>> >>> Coefficients: >>> Estimate Std. Error t value Pr(>|t|) >>> (Intercept) -0.001630 0.001763 -0.925 0.3559 >>> Fed -0.121595 0.165359 -0.735 0.4627 >>> Fed.t.1. 0.344014 0.140979 2.440 0.0153 * >>> Fed.t.2. 0.026529 0.143648 0.185 0.8536 >>> Fed.t.3. 0.622357 0.142021 4.382 1.62e-05 *** >>> Fed.t.4. 0.291985 0.158914 1.837 0.0671 . >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>> >>> Residual standard error: 0.0293 on 304 degrees of freedom >>> (20 observations deleted due to missingness) >>> Multiple R-squared: 0.08629, Adjusted R-squared: 0.07126 >>> F-statistic: 5.742 on 5 and 304 DF, p-value: 4.422e-05 >>> >>> I have produced all of these regressions myself and pasted them >>> directly from the 'R' software package. My regression methodology is >>> entirely my own along with the sourcing and preperation of the data >>> used to produce these statistics. >>> >>> I would be very grateful if you could provide my with some clarity as >>> to why this output from 'R' is reading as plagiarism. >>> >>> I would like to thank you in advance, >>> >>> Kind
Re: [R] 'R' Software Output Plagiarism
This is just guessing but the reason is probably that the regression output (not including the specific numbers, and your variable names) is standard R output as already noted. It probably appears in many other theses and dissertations , in books on R, and possibly in appendices in published books and papers reporting research findings. It, or parts of it, may occur thousands of times on R-help and in R oriented blogs and other documents on the Web. It quite likely shows up on Stack Overflow. Here is one example it took me about 2 minutes to find. http://www.montefiore.ulg.ac.be/~kvansteen/GBIO0009-1/ac20092010/Class8/Using%20R%20for%20linear%20regression.pdf. And here's another http://www.princeton.edu/~otorres/Regression101R.pdf. Have a look at Julian Faraway's pdf "Practical Regression and Anova using R" book in the Contributed section of the R home site at pp -23-24. There it is again. I think you probably should do a bit of on-line searching and a sweep of some of the Manuals and Contributed materials on the R site and point out to the powers that be that it is not plagiarism, it's just standard R reporting.of regression results. John Kane Kingston ON Canada > -Original Message- > From: oliver.barr...@skema.edu > Sent: Tue, 22 Sep 2015 14:27:03 + > To: r-help@r-project.org > Subject: [R] 'R' Software Output Plagiarism > > > Dear 'R' community support, > > > I am a student at Skema business school and I have recently submitted my > MSc thesis/dissertation. This has been passed on to an external > plagiarism service provider, Urkund, who have scanned my document and > returned a plagiarism report to my professor having detected 32% > plagiarism. > > > I have contacted Urkund regarding this issue having committed no such > plagiarism and they have told me that all the plagiarism detected in my > document comes from the last 25% which consists only of 'R' regressions > like the one I have pasted below: > > lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + > Fed.t.4., data = OLS_CAR, x = TRUE) > > Residuals: > Min1QMedian3Q Max > -0.154587 -0.015961 0.001429 0.017196 0.110907 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.001630 0.001763 -0.925 0.3559 > Fed -0.121595 0.165359 -0.735 0.4627 > Fed.t.1. 0.344014 0.140979 2.440 0.0153 * > Fed.t.2. 0.026529 0.143648 0.185 0.8536 > Fed.t.3. 0.622357 0.142021 4.382 1.62e-05 *** > Fed.t.4. 0.291985 0.158914 1.837 0.0671 . > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 0.0293 on 304 degrees of freedom > (20 observations deleted due to missingness) > Multiple R-squared: 0.08629, Adjusted R-squared: 0.07126 > F-statistic: 5.742 on 5 and 304 DF, p-value: 4.422e-05 > > I have produced all of these regressions myself and pasted them directly > from the 'R' software package. My regression methodology is entirely my > own along with the sourcing and preperation of the data used to produce > these statistics. > > I would be very grateful if you could provide my with some clarity as to > why this output from 'R' is reading as plagiarism. > > I would like to thank you in advance, > > Kind regards, > > Oliver Barrett > (+44) 7341 834 217 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'R' Software Output Plagiarism
Isn't plagiarism detection based on overlaps with sentence structure? That way, it would catch plagiarism if someone simply did a find-and-replace. But that would also catch regressions with the same output format. How long was the original thesis? If 25% of it was all regression output, sounds like a lot of regressions. On Tue, Sep 22, 2015 at 4:06 PM, peter dalgaardwrote: > Marc, > > I don't think Copyright/Intellectual property issues factor into this. Urkund > and similar tools are to my knowledge entirely about plagiarism. So the issue > would seem to be that the R output is considered identical or nearly > indentical to R output in other published orotherwise submitted material. > > What puzzles me (except for how a document can be deemed 32% plagiarized in > 25% of the text) is whether this includes the numbers and variable names. If > those are somehow factored out, then any R regression could be pretty much > identical to any other R regression. However, two analyses with similar > variable names could happen if they are based on the same cookbook recipe and > analyses with similar numerical output come from analyzing the same standard > data. Such situations would not necessarily be considered plagiarism (I mean: > If you claim that you are analyzing data from experiments that you yourself > have performed, and your numbers are exactly identical to something that has > been previously published, then it would be suspect. If you analyze something > from public sources, someone else might well have done the same thing.). > > Similarly to John Kane, I think it is necessary to know exactly what sources > the text is claimed to be plagiarized from and/or what parts of the text that > are being matched by Urkund. If it turns out that Urkund is generating false > positives, then this needs to be pointed out to them and to the people basing > decisions on it. > > -pd > >> On 22 Sep 2015, at 18:24 , Marc Schwartz wrote: >> >> Hi, >> >> With the usual caveat that I Am Not A Lawyerand that I am not speaking >> on behalf of any organization... >> >> My guess is that they are claiming that the output of R, simply being copied >> and pasted verbatim into your thesis constitutes the use of copyrighted >> output from the software. >> >> It is not clear to me that R's output is copyrighted by the R Foundation (or >> by other parties for CRAN packages), albeit, the source code underlying R >> is, along with other copyright owner's as apropos. There is some caselaw to >> support the notion that the output alone is not protected in a similar >> manner, but that may be country specific. >> >> Did you provide any credit to R (see the output of citation() ) in your >> thesis and indicate that your analyses were performed using R? >> >> If R is uncredited, I could see them raising the issue. >> >> You might check with your institution's legal/policy folks to see if there >> is any guidance provided for students regarding the crediting of software >> used in this manner, especially if that guidance is at no cost to you. >> >> Regards, >> >> Marc Schwartz >> >> >>> On Sep 22, 2015, at 11:01 AM, Bert Gunter wrote: >>> >>> 1. It is highly unlikely that we could be of help (unless someone else >>> has experienced this and knows what happened). You will have to >>> contact the Urkund people and ask them why their algorithms raised the >>> flags. >>> >>> 2. But of course, the regression methodology is not "your own" -- it's >>> just a standard tool that you used in your work, which is entirely >>> legitimate of course. >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge >>> is certainly not wisdom." >>> -- Clifford Stoll >>> >>> >>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver >>> wrote: Dear 'R' community support, I am a student at Skema business school and I have recently submitted my MSc thesis/dissertation. This has been passed on to an external plagiarism service provider, Urkund, who have scanned my document and returned a plagiarism report to my professor having detected 32% plagiarism. I have contacted Urkund regarding this issue having committed no such plagiarism and they have told me that all the plagiarism detected in my document comes from the last 25% which consists only of 'R' regressions like the one I have pasted below: lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + Fed.t.4., data = OLS_CAR, x = TRUE) Residuals: Min1QMedian3Q Max -0.154587 -0.015961 0.001429 0.017196 0.110907 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.001630 0.001763 -0.925 0.3559 Fed -0.121595 0.165359
Re: [R] 'R' Software Output Plagiarism
Peter, Great distinction. I was leaning in the direction that the "look and feel" of the output (standard wording, table structure, column headings, significance stars and so forth in the output) is similar to whatever Urkund is using as the basis for the comparison and less so on an exact replication (covariates, coefficients, etc.), or nearly so, of prior work. Thanks, Marc > On Sep 22, 2015, at 3:06 PM, peter dalgaardwrote: > > Marc, > > I don't think Copyright/Intellectual property issues factor into this. Urkund > and similar tools are to my knowledge entirely about plagiarism. So the issue > would seem to be that the R output is considered identical or nearly > indentical to R output in other published orotherwise submitted material. > > What puzzles me (except for how a document can be deemed 32% plagiarized in > 25% of the text) is whether this includes the numbers and variable names. If > those are somehow factored out, then any R regression could be pretty much > identical to any other R regression. However, two analyses with similar > variable names could happen if they are based on the same cookbook recipe and > analyses with similar numerical output come from analyzing the same standard > data. Such situations would not necessarily be considered plagiarism (I mean: > If you claim that you are analyzing data from experiments that you yourself > have performed, and your numbers are exactly identical to something that has > been previously published, then it would be suspect. If you analyze something > from public sources, someone else might well have done the same thing.). > > Similarly to John Kane, I think it is necessary to know exactly what sources > the text is claimed to be plagiarized from and/or what parts of the text that > are being matched by Urkund. If it turns out that Urkund is generating false > positives, then this needs to be pointed out to them and to the people basing > decisions on it. > > -pd > >> On 22 Sep 2015, at 18:24 , Marc Schwartz wrote: >> >> Hi, >> >> With the usual caveat that I Am Not A Lawyerand that I am not speaking >> on behalf of any organization... >> >> My guess is that they are claiming that the output of R, simply being copied >> and pasted verbatim into your thesis constitutes the use of copyrighted >> output from the software. >> >> It is not clear to me that R's output is copyrighted by the R Foundation (or >> by other parties for CRAN packages), albeit, the source code underlying R >> is, along with other copyright owner's as apropos. There is some caselaw to >> support the notion that the output alone is not protected in a similar >> manner, but that may be country specific. >> >> Did you provide any credit to R (see the output of citation() ) in your >> thesis and indicate that your analyses were performed using R? >> >> If R is uncredited, I could see them raising the issue. >> >> You might check with your institution's legal/policy folks to see if there >> is any guidance provided for students regarding the crediting of software >> used in this manner, especially if that guidance is at no cost to you. >> >> Regards, >> >> Marc Schwartz >> >> >>> On Sep 22, 2015, at 11:01 AM, Bert Gunter wrote: >>> >>> 1. It is highly unlikely that we could be of help (unless someone else >>> has experienced this and knows what happened). You will have to >>> contact the Urkund people and ask them why their algorithms raised the >>> flags. >>> >>> 2. But of course, the regression methodology is not "your own" -- it's >>> just a standard tool that you used in your work, which is entirely >>> legitimate of course. >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge >>> is certainly not wisdom." >>> -- Clifford Stoll >>> >>> >>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver >>> wrote: Dear 'R' community support, I am a student at Skema business school and I have recently submitted my MSc thesis/dissertation. This has been passed on to an external plagiarism service provider, Urkund, who have scanned my document and returned a plagiarism report to my professor having detected 32% plagiarism. I have contacted Urkund regarding this issue having committed no such plagiarism and they have told me that all the plagiarism detected in my document comes from the last 25% which consists only of 'R' regressions like the one I have pasted below: lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + Fed.t.4., data = OLS_CAR, x = TRUE) Residuals: Min1QMedian3Q Max -0.154587 -0.015961 0.001429 0.017196 0.110907 Coefficients: Estimate Std. Error t value Pr(>|t|)
Re: [R] 'R' Software Output Plagiarism
Hi all, Thank you so much for your input. Just to clarify, of the 32% plagiarism detected only 27-8% has come from the regressions but this is expected as the appendix where the regressions are contained is much more dense with text and numbers than the rest of the document. The other 5% will be my quotations and references but that's normal, Thanks again, I will be sharing your thoughts with my thesis supervisor. Cheers, Oliver From: Marc Schwartz <marc_schwa...@me.com> Sent: 22 September 2015 22:27 To: peter dalgaard Cc: Bert Gunter; BARRETT, Oliver; R-help Subject: Re: [R] 'R' Software Output Plagiarism Peter, Great distinction. I was leaning in the direction that the "look and feel" of the output (standard wording, table structure, column headings, significance stars and so forth in the output) is similar to whatever Urkund is using as the basis for the comparison and less so on an exact replication (covariates, coefficients, etc.), or nearly so, of prior work. Thanks, Marc > On Sep 22, 2015, at 3:06 PM, peter dalgaard <pda...@gmail.com> wrote: > > Marc, > > I don't think Copyright/Intellectual property issues factor into this. Urkund > and similar tools are to my knowledge entirely about plagiarism. So the issue > would seem to be that the R output is considered identical or nearly > indentical to R output in other published orotherwise submitted material. > > What puzzles me (except for how a document can be deemed 32% plagiarized in > 25% of the text) is whether this includes the numbers and variable names. If > those are somehow factored out, then any R regression could be pretty much > identical to any other R regression. However, two analyses with similar > variable names could happen if they are based on the same cookbook recipe and > analyses with similar numerical output come from analyzing the same standard > data. Such situations would not necessarily be considered plagiarism (I mean: > If you claim that you are analyzing data from experiments that you yourself > have performed, and your numbers are exactly identical to something that has > been previously published, then it would be suspect. If you analyze something > from public sources, someone else might well have done the same thing.). > > Similarly to John Kane, I think it is necessary to know exactly what sources > the text is claimed to be plagiarized from and/or what parts of the text that > are being matched by Urkund. If it turns out that Urkund is generating false > positives, then this needs to be pointed out to them and to the people basing > decisions on it. > > -pd > >> On 22 Sep 2015, at 18:24 , Marc Schwartz <marc_schwa...@me.com> wrote: >> >> Hi, >> >> With the usual caveat that I Am Not A Lawyerand that I am not speaking >> on behalf of any organization... >> >> My guess is that they are claiming that the output of R, simply being copied >> and pasted verbatim into your thesis constitutes the use of copyrighted >> output from the software. >> >> It is not clear to me that R's output is copyrighted by the R Foundation (or >> by other parties for CRAN packages), albeit, the source code underlying R >> is, along with other copyright owner's as apropos. There is some caselaw to >> support the notion that the output alone is not protected in a similar >> manner, but that may be country specific. >> >> Did you provide any credit to R (see the output of citation() ) in your >> thesis and indicate that your analyses were performed using R? >> >> If R is uncredited, I could see them raising the issue. >> >> You might check with your institution's legal/policy folks to see if there >> is any guidance provided for students regarding the crediting of software >> used in this manner, especially if that guidance is at no cost to you. >> >> Regards, >> >> Marc Schwartz >> >> >>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4...@gmail.com> wrote: >>> >>> 1. It is highly unlikely that we could be of help (unless someone else >>> has experienced this and knows what happened). You will have to >>> contact the Urkund people and ask them why their algorithms raised the >>> flags. >>> >>> 2. But of course, the regression methodology is not "your own" -- it's >>> just a standard tool that you used in your work, which is entirely >>> legitimate of course. >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And
Re: [R] 'R' Software Output Plagiarism
On 22/09/2015 4:06 PM, peter dalgaard wrote: > Marc, > > I don't think Copyright/Intellectual property issues factor into this. Urkund > and similar tools are to my knowledge entirely about plagiarism. So the issue > would seem to be that the R output is considered identical or nearly > indentical to R output in other published orotherwise submitted material. > > What puzzles me (except for how a document can be deemed 32% plagiarized in > 25% of the text) is whether this includes the numbers and variable names. If > those are somehow factored out, then any R regression could be pretty much > identical to any other R regression. However, two analyses with similar > variable names could happen if they are based on the same cookbook recipe and > analyses with similar numerical output come from analyzing the same standard > data. Such situations would not necessarily be considered plagiarism (I mean: > If you claim that you are analyzing data from experiments that you yourself > have performed, and your numbers are exactly identical to something that has > been previously published, then it would be suspect. If you analyze something > from public sources, someone else might well have done the same thing.). I don't see why this puzzles you. A simple explanation is that Urkund is incompetent. Many companies that sell software to university administrations are incompetent, because the buyers have been promoted so far beyond their competence that they'll buy anything if it is expensive enough. This isn't uncommon. Duncan Murdoch > > Similarly to John Kane, I think it is necessary to know exactly what sources > the text is claimed to be plagiarized from and/or what parts of the text that > are being matched by Urkund. If it turns out that Urkund is generating false > positives, then this needs to be pointed out to them and to the people basing > decisions on it. > > -pd > >> On 22 Sep 2015, at 18:24 , Marc Schwartzwrote: >> >> Hi, >> >> With the usual caveat that I Am Not A Lawyerand that I am not speaking >> on behalf of any organization... >> >> My guess is that they are claiming that the output of R, simply being copied >> and pasted verbatim into your thesis constitutes the use of copyrighted >> output from the software. >> >> It is not clear to me that R's output is copyrighted by the R Foundation (or >> by other parties for CRAN packages), albeit, the source code underlying R >> is, along with other copyright owner's as apropos. There is some caselaw to >> support the notion that the output alone is not protected in a similar >> manner, but that may be country specific. >> >> Did you provide any credit to R (see the output of citation() ) in your >> thesis and indicate that your analyses were performed using R? >> >> If R is uncredited, I could see them raising the issue. >> >> You might check with your institution's legal/policy folks to see if there >> is any guidance provided for students regarding the crediting of software >> used in this manner, especially if that guidance is at no cost to you. >> >> Regards, >> >> Marc Schwartz >> >> >>> On Sep 22, 2015, at 11:01 AM, Bert Gunter wrote: >>> >>> 1. It is highly unlikely that we could be of help (unless someone else >>> has experienced this and knows what happened). You will have to >>> contact the Urkund people and ask them why their algorithms raised the >>> flags. >>> >>> 2. But of course, the regression methodology is not "your own" -- it's >>> just a standard tool that you used in your work, which is entirely >>> legitimate of course. >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge >>> is certainly not wisdom." >>> -- Clifford Stoll >>> >>> >>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver >>> wrote: Dear 'R' community support, I am a student at Skema business school and I have recently submitted my MSc thesis/dissertation. This has been passed on to an external plagiarism service provider, Urkund, who have scanned my document and returned a plagiarism report to my professor having detected 32% plagiarism. I have contacted Urkund regarding this issue having committed no such plagiarism and they have told me that all the plagiarism detected in my document comes from the last 25% which consists only of 'R' regressions like the one I have pasted below: lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + Fed.t.4., data = OLS_CAR, x = TRUE) Residuals: Min1QMedian3Q Max -0.154587 -0.015961 0.001429 0.017196 0.110907 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.001630 0.001763 -0.925 0.3559 Fed -0.121595 0.165359 -0.735 0.4627
[R] 'R' Software Output Plagiarism
Dear 'R' community support, I am a student at Skema business school and I have recently submitted my MSc thesis/dissertation. This has been passed on to an external plagiarism service provider, Urkund, who have scanned my document and returned a plagiarism report to my professor having detected 32% plagiarism. I have contacted Urkund regarding this issue having committed no such plagiarism and they have told me that all the plagiarism detected in my document comes from the last 25% which consists only of 'R' regressions like the one I have pasted below: lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + Fed.t.4., data = OLS_CAR, x = TRUE) Residuals: Min1QMedian3Q Max -0.154587 -0.015961 0.001429 0.017196 0.110907 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.001630 0.001763 -0.925 0.3559 Fed -0.121595 0.165359 -0.735 0.4627 Fed.t.1. 0.344014 0.140979 2.440 0.0153 * Fed.t.2. 0.026529 0.143648 0.185 0.8536 Fed.t.3. 0.622357 0.142021 4.382 1.62e-05 *** Fed.t.4. 0.291985 0.158914 1.837 0.0671 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.0293 on 304 degrees of freedom (20 observations deleted due to missingness) Multiple R-squared: 0.08629, Adjusted R-squared: 0.07126 F-statistic: 5.742 on 5 and 304 DF, p-value: 4.422e-05 I have produced all of these regressions myself and pasted them directly from the 'R' software package. My regression methodology is entirely my own along with the sourcing and preperation of the data used to produce these statistics. I would be very grateful if you could provide my with some clarity as to why this output from 'R' is reading as plagiarism. I would like to thank you in advance, Kind regards, Oliver Barrett (+44) 7341 834 217 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'R' Software Output Plagiarism
1. It is highly unlikely that we could be of help (unless someone else has experienced this and knows what happened). You will have to contact the Urkund people and ask them why their algorithms raised the flags. 2. But of course, the regression methodology is not "your own" -- it's just a standard tool that you used in your work, which is entirely legitimate of course. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliverwrote: > > Dear 'R' community support, > > > I am a student at Skema business school and I have recently submitted my MSc > thesis/dissertation. This has been passed on to an external plagiarism > service provider, Urkund, who have scanned my document and returned a > plagiarism report to my professor having detected 32% plagiarism. > > > I have contacted Urkund regarding this issue having committed no such > plagiarism and they have told me that all the plagiarism detected in my > document comes from the last 25% which consists only of 'R' regressions like > the one I have pasted below: > > lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + > Fed.t.4., data = OLS_CAR, x = TRUE) > > Residuals: > Min1QMedian3Q Max > -0.154587 -0.015961 0.001429 0.017196 0.110907 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.001630 0.001763 -0.925 0.3559 > Fed -0.121595 0.165359 -0.735 0.4627 > Fed.t.1. 0.344014 0.140979 2.440 0.0153 * > Fed.t.2. 0.026529 0.143648 0.185 0.8536 > Fed.t.3. 0.622357 0.142021 4.382 1.62e-05 *** > Fed.t.4. 0.291985 0.158914 1.837 0.0671 . > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 0.0293 on 304 degrees of freedom > (20 observations deleted due to missingness) > Multiple R-squared: 0.08629, Adjusted R-squared: 0.07126 > F-statistic: 5.742 on 5 and 304 DF, p-value: 4.422e-05 > > I have produced all of these regressions myself and pasted them directly from > the 'R' software package. My regression methodology is entirely my own along > with the sourcing and preperation of the data used to produce these > statistics. > > I would be very grateful if you could provide my with some clarity as to why > this output from 'R' is reading as plagiarism. > > I would like to thank you in advance, > > Kind regards, > > Oliver Barrett > (+44) 7341 834 217 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.