Re: [R] 'R' Software Output Plagiarism

2015-09-24 Thread peter dalgaard

On 23 Sep 2015, at 02:33 , Duncan Murdoch  wrote:

> I don't see why this puzzles you.  A simple explanation is that Urkund
> is incompetent.

That much I figured. What I was puzzled about was _how_ it was being 
incompetent. Also how it could be so in a way that wouldn't be obvious to the 
professor in question. 

I haven't used Urkund, but people are pushing it here at CBS too. If they start 
automating it on electronic submissions in maths and stats, some interesting 
things could happen. I have used iThenticate as a journal editor, and that one 
will tell you word for word which sections of text are identical to which 
sections of text in which other documents. 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'R' Software Output Plagiarism

2015-09-24 Thread Vik Rubenfeld
Your professor should immediately recognize that the quoted code is standard 
regression input/output and that the Urkund results in this case are without 
merit.


> On Sep 22, 2015, at 7:27 AM, BARRETT, Oliver  wrote:
> 
> 
> Dear 'R' community support,
> 
> 
> I am a student at Skema business school and I have recently submitted my MSc 
> thesis/dissertation. This has been passed on to an external plagiarism 
> service provider, Urkund, who have scanned my document and returned a 
> plagiarism report to my professor having detected 32% plagiarism.
> 
> 
> I have contacted Urkund regarding this issue having committed no such 
> plagiarism and they have told me that all the plagiarism detected in my 
> document comes from the last 25% which consists only of 'R' regressions like 
> the one I have pasted below:
> 
> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>Fed.t.4., data = OLS_CAR, x = TRUE)
> 
> Residuals:
>  Min1QMedian3Q   Max
> -0.154587 -0.015961  0.001429  0.017196  0.110907
> 
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.001630   0.001763  -0.925   0.3559
> Fed -0.121595   0.165359  -0.735   0.4627
> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
> Fed.t.2. 0.026529   0.143648   0.185   0.8536
> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> Residual standard error: 0.0293 on 304 degrees of freedom
>  (20 observations deleted due to missingness)
> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
> 
> I have produced all of these regressions myself and pasted them directly from 
> the 'R' software package. My regression methodology is entirely my own along 
> with the sourcing and preperation of the data used to produce these 
> statistics.
> 
> I would be very grateful if you could provide my with some clarity as to why 
> this output from 'R' is reading as plagiarism.
> 
> I would like to thank you in advance,
> 
> Kind regards,
> 
> Oliver Barrett
> (+44) 7341 834 217
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'R' Software Output Plagiarism

2015-09-24 Thread John Kane

> -Original Message-
> From: pda...@gmail.com
> Sent: Thu, 24 Sep 2015 11:41:19 +0200
> To: murdoch.dun...@gmail.com
> Subject: Re: [R] 'R' Software Output Plagiarism
> 
> 
> On 23 Sep 2015, at 02:33 , Duncan Murdoch <murdoch.dun...@gmail.com>
> wrote:
> 
>> I don't see why this puzzles you.  A simple explanation is that Urkund
>> is incompetent.
> 
> That much I figured. What I was puzzled about was _how_ it was being
> incompetent. Also how it could be so in a way that wouldn't be obvious to
> the professor in question.

My guess is that it just ran the scan and emailed back the results with no 
analysis. It sounds like the software is functioning 'properly' but the 
humanware is not. But then, there is no reason to expect them to be subject 
matter experts, either. One should just hope they would supply better reports 
than it appears Oliver and his professor received.

Re the professor, he/she may have just tossed the report to Oliver and said, 
"Explain this".  Once Oliver discussed the issue with Ukund it should be 
blinding obvious to the professor.


Share photos & screenshots in seconds...
TRY FREE IM TOOLPACK at http://www.imtoolpack.com/default.aspx?rc=if1
Works in all emails, instant messengers, blogs, forums and social networks.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread peter dalgaard
Marc,

I don't think Copyright/Intellectual property issues factor into this. Urkund 
and similar tools are to my knowledge entirely about plagiarism. So the issue 
would seem to be that the R output is considered identical or nearly indentical 
to R output in other published orotherwise  submitted material.

What puzzles me (except for how a document can be deemed 32% plagiarized in 25% 
of the text) is whether this includes the numbers and variable names. If those 
are somehow factored out, then any R regression could be pretty much identical 
to any other R regression. However, two analyses with similar variable names 
could happen if they are based on the same cookbook recipe and analyses with 
similar numerical output come from analyzing the same standard data. Such 
situations would not necessarily be considered plagiarism (I mean: If you claim 
that you are analyzing data from experiments that you yourself have performed, 
and your numbers are exactly identical to something that has been previously 
published, then it would be suspect. If you analyze something from public 
sources, someone else might well have done the same thing.). 

Similarly to John Kane, I think it is necessary to know exactly what sources 
the text is claimed to be plagiarized from and/or what parts of the text that 
are being matched by Urkund. If it turns out that Urkund is generating false 
positives, then this needs to be pointed out to them and to the people basing 
decisions on it.

-pd

> On 22 Sep 2015, at 18:24 , Marc Schwartz  wrote:
> 
> Hi,
> 
> With the usual caveat that I Am Not A Lawyerand that I am not speaking on 
> behalf of any organization...
> 
> My guess is that they are claiming that the output of R, simply being copied 
> and pasted verbatim into your thesis constitutes the use of copyrighted 
> output from the software.
> 
> It is not clear to me that R's output is copyrighted by the R Foundation (or 
> by other parties for CRAN packages), albeit, the source code underlying R is, 
> along with other copyright owner's as apropos. There is some caselaw to 
> support the notion that the output alone is not protected in a similar 
> manner, but that may be country specific.
> 
> Did you provide any credit to R (see the output of citation() ) in your 
> thesis and indicate that your analyses were performed using R?
> 
> If R is uncredited, I could see them raising the issue.
> 
> You might check with your institution's legal/policy folks to see if there is 
> any guidance provided for students regarding the crediting of software used 
> in this manner, especially if that guidance is at no cost to you.
> 
> Regards,
> 
> Marc Schwartz
> 
> 
>> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
>> 
>> 1. It is highly unlikely that we could be of help (unless someone else
>> has experienced this and knows what happened). You will have to
>> contact the Urkund people and ask them why their algorithms raised the
>> flags.
>> 
>> 2. But of course, the regression methodology is not "your own" -- it's
>> just a standard tool that you used in your work, which is entirely
>> legitimate of course.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>  -- Clifford Stoll
>> 
>> 
>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>  wrote:
>>> 
>>> Dear 'R' community support,
>>> 
>>> 
>>> I am a student at Skema business school and I have recently submitted my 
>>> MSc thesis/dissertation. This has been passed on to an external plagiarism 
>>> service provider, Urkund, who have scanned my document and returned a 
>>> plagiarism report to my professor having detected 32% plagiarism.
>>> 
>>> 
>>> I have contacted Urkund regarding this issue having committed no such 
>>> plagiarism and they have told me that all the plagiarism detected in my 
>>> document comes from the last 25% which consists only of 'R' regressions 
>>> like the one I have pasted below:
>>> 
>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>   Fed.t.4., data = OLS_CAR, x = TRUE)
>>> 
>>> Residuals:
>>> Min1QMedian3Q   Max
>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>> 
>>> Coefficients:
>>>Estimate Std. Error t value Pr(>|t|)
>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>> Fed -0.121595   0.165359  -0.735   0.4627
>>> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
>>> Fed.t.2. 0.026529   0.143648   0.185   0.8536
>>> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
>>> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>> 
>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>> (20 observations deleted due to missingness)
>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>> 

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread Marc Schwartz
Hi,

With the usual caveat that I Am Not A Lawyerand that I am not speaking on 
behalf of any organization...

My guess is that they are claiming that the output of R, simply being copied 
and pasted verbatim into your thesis constitutes the use of copyrighted output 
from the software.

It is not clear to me that R's output is copyrighted by the R Foundation (or by 
other parties for CRAN packages), albeit, the source code underlying R is, 
along with other copyright owner's as apropos. There is some caselaw to support 
the notion that the output alone is not protected in a similar manner, but that 
may be country specific.

Did you provide any credit to R (see the output of citation() ) in your thesis 
and indicate that your analyses were performed using R?

If R is uncredited, I could see them raising the issue.

You might check with your institution's legal/policy folks to see if there is 
any guidance provided for students regarding the crediting of software used in 
this manner, especially if that guidance is at no cost to you.

Regards,

Marc Schwartz


> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
> 
> 1. It is highly unlikely that we could be of help (unless someone else
> has experienced this and knows what happened). You will have to
> contact the Urkund people and ask them why their algorithms raised the
> flags.
> 
> 2. But of course, the regression methodology is not "your own" -- it's
> just a standard tool that you used in your work, which is entirely
> legitimate of course.
> 
> Cheers,
> Bert
> 
> 
> Bert Gunter
> 
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>   -- Clifford Stoll
> 
> 
> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>  wrote:
>> 
>> Dear 'R' community support,
>> 
>> 
>> I am a student at Skema business school and I have recently submitted my MSc 
>> thesis/dissertation. This has been passed on to an external plagiarism 
>> service provider, Urkund, who have scanned my document and returned a 
>> plagiarism report to my professor having detected 32% plagiarism.
>> 
>> 
>> I have contacted Urkund regarding this issue having committed no such 
>> plagiarism and they have told me that all the plagiarism detected in my 
>> document comes from the last 25% which consists only of 'R' regressions like 
>> the one I have pasted below:
>> 
>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>Fed.t.4., data = OLS_CAR, x = TRUE)
>> 
>> Residuals:
>>  Min1QMedian3Q   Max
>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>> 
>> Coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>> Fed -0.121595   0.165359  -0.735   0.4627
>> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
>> Fed.t.2. 0.026529   0.143648   0.185   0.8536
>> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
>> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>> 
>> Residual standard error: 0.0293 on 304 degrees of freedom
>>  (20 observations deleted due to missingness)
>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>> 
>> I have produced all of these regressions myself and pasted them directly 
>> from the 'R' software package. My regression methodology is entirely my own 
>> along with the sourcing and preperation of the data used to produce these 
>> statistics.
>> 
>> I would be very grateful if you could provide my with some clarity as to why 
>> this output from 'R' is reading as plagiarism.
>> 
>> I would like to thank you in advance,
>> 
>> Kind regards,
>> 
>> Oliver Barrett
>> (+44) 7341 834 217
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread John Kane
Very good point about the referencing. 

I wonder if this is happening to users of Stata or SAS as well?

John Kane
Kingston ON Canada


> -Original Message-
> From: marc_schwa...@me.com
> Sent: Tue, 22 Sep 2015 11:24:13 -0500
> To: bgunter.4...@gmail.com
> Subject: Re: [R] 'R' Software Output Plagiarism
> 
> Hi,
> 
> With the usual caveat that I Am Not A Lawyerand that I am not
> speaking on behalf of any organization...
> 
> My guess is that they are claiming that the output of R, simply being
> copied and pasted verbatim into your thesis constitutes the use of
> copyrighted output from the software.
> 
> It is not clear to me that R's output is copyrighted by the R Foundation
> (or by other parties for CRAN packages), albeit, the source code
> underlying R is, along with other copyright owner's as apropos. There is
> some caselaw to support the notion that the output alone is not protected
> in a similar manner, but that may be country specific.
> 
> Did you provide any credit to R (see the output of citation() ) in your
> thesis and indicate that your analyses were performed using R?
> 
> If R is uncredited, I could see them raising the issue.
> 
> You might check with your institution's legal/policy folks to see if
> there is any guidance provided for students regarding the crediting of
> software used in this manner, especially if that guidance is at no cost
> to you.
> 
> Regards,
> 
> Marc Schwartz
> 
> 
>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4...@gmail.com>
>> wrote:
>> 
>> 1. It is highly unlikely that we could be of help (unless someone else
>> has experienced this and knows what happened). You will have to
>> contact the Urkund people and ask them why their algorithms raised the
>> flags.
>> 
>> 2. But of course, the regression methodology is not "your own" -- it's
>> just a standard tool that you used in your work, which is entirely
>> legitimate of course.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>   -- Clifford Stoll
>> 
>> 
>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>> <oliver.barr...@skema.edu> wrote:
>>> 
>>> Dear 'R' community support,
>>> 
>>> 
>>> I am a student at Skema business school and I have recently submitted
>>> my MSc thesis/dissertation. This has been passed on to an external
>>> plagiarism service provider, Urkund, who have scanned my document and
>>> returned a plagiarism report to my professor having detected 32%
>>> plagiarism.
>>> 
>>> 
>>> I have contacted Urkund regarding this issue having committed no such
>>> plagiarism and they have told me that all the plagiarism detected in my
>>> document comes from the last 25% which consists only of 'R' regressions
>>> like the one I have pasted below:
>>> 
>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>Fed.t.4., data = OLS_CAR, x = TRUE)
>>> 
>>> Residuals:
>>>  Min1QMedian3Q   Max
>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>> 
>>> Coefficients:
>>> Estimate Std. Error t value Pr(>|t|)
>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>> Fed -0.121595   0.165359  -0.735   0.4627
>>> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
>>> Fed.t.2. 0.026529   0.143648   0.185   0.8536
>>> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
>>> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>> 
>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>>  (20 observations deleted due to missingness)
>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>>> 
>>> I have produced all of these regressions myself and pasted them
>>> directly from the 'R' software package. My regression methodology is
>>> entirely my own along with the sourcing and preperation of the data
>>> used to produce these statistics.
>>> 
>>> I would be very grateful if you could provide my with some clarity as
>>> to why this output from 'R' is reading as plagiarism.
>>> 
>>> I would like to thank you in advance,
>>> 
>>> Kind 

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread John Kane
This is just guessing but the reason is probably that the regression output 
(not including the specific numbers, and your variable names) is standard R 
output as already noted.

It probably appears in many other theses and dissertations , in books on R, and 
possibly in appendices in published books and papers reporting research 
findings. 

It, or parts of it, may occur thousands of times on R-help and in R oriented 
blogs and other documents on the Web.  It quite likely shows up on Stack 
Overflow.

Here is one  example it took me about 2 minutes to find.
http://www.montefiore.ulg.ac.be/~kvansteen/GBIO0009-1/ac20092010/Class8/Using%20R%20for%20linear%20regression.pdf.
 And here's another http://www.princeton.edu/~otorres/Regression101R.pdf. 

Have a look at Julian Faraway's pdf "Practical Regression and Anova using R" 
book in the Contributed section of the R home site at pp -23-24. There it is 
again.

I think you probably should do a bit of on-line searching and a sweep of some 
of the Manuals and Contributed materials on the R site and point out to the 
powers that be that it is not plagiarism, it's just standard R reporting.of 
regression results.


John Kane
Kingston ON Canada


> -Original Message-
> From: oliver.barr...@skema.edu
> Sent: Tue, 22 Sep 2015 14:27:03 +
> To: r-help@r-project.org
> Subject: [R] 'R' Software Output Plagiarism
> 
> 
> Dear 'R' community support,
> 
> 
> I am a student at Skema business school and I have recently submitted my
> MSc thesis/dissertation. This has been passed on to an external
> plagiarism service provider, Urkund, who have scanned my document and
> returned a plagiarism report to my professor having detected 32%
> plagiarism.
> 
> 
> I have contacted Urkund regarding this issue having committed no such
> plagiarism and they have told me that all the plagiarism detected in my
> document comes from the last 25% which consists only of 'R' regressions
> like the one I have pasted below:
> 
> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
> Fed.t.4., data = OLS_CAR, x = TRUE)
> 
> Residuals:
>   Min1QMedian3Q   Max
> -0.154587 -0.015961  0.001429  0.017196  0.110907
> 
> Coefficients:
>  Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.001630   0.001763  -0.925   0.3559
> Fed -0.121595   0.165359  -0.735   0.4627
> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
> Fed.t.2. 0.026529   0.143648   0.185   0.8536
> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> Residual standard error: 0.0293 on 304 degrees of freedom
>   (20 observations deleted due to missingness)
> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
> 
> I have produced all of these regressions myself and pasted them directly
> from the 'R' software package. My regression methodology is entirely my
> own along with the sourcing and preperation of the data used to produce
> these statistics.
> 
> I would be very grateful if you could provide my with some clarity as to
> why this output from 'R' is reading as plagiarism.
> 
> I would like to thank you in advance,
> 
> Kind regards,
> 
> Oliver Barrett
> (+44) 7341 834 217
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread Mitchell Maltenfort
Isn't plagiarism detection based on overlaps with sentence structure?
That way, it would catch plagiarism if someone simply did a
find-and-replace. But that would also catch regressions with the same
output format.

How long was the original thesis?  If 25% of it was all regression
output, sounds like a lot of regressions.



On Tue, Sep 22, 2015 at 4:06 PM, peter dalgaard  wrote:
> Marc,
>
> I don't think Copyright/Intellectual property issues factor into this. Urkund 
> and similar tools are to my knowledge entirely about plagiarism. So the issue 
> would seem to be that the R output is considered identical or nearly 
> indentical to R output in other published orotherwise  submitted material.
>
> What puzzles me (except for how a document can be deemed 32% plagiarized in 
> 25% of the text) is whether this includes the numbers and variable names. If 
> those are somehow factored out, then any R regression could be pretty much 
> identical to any other R regression. However, two analyses with similar 
> variable names could happen if they are based on the same cookbook recipe and 
> analyses with similar numerical output come from analyzing the same standard 
> data. Such situations would not necessarily be considered plagiarism (I mean: 
> If you claim that you are analyzing data from experiments that you yourself 
> have performed, and your numbers are exactly identical to something that has 
> been previously published, then it would be suspect. If you analyze something 
> from public sources, someone else might well have done the same thing.).
>
> Similarly to John Kane, I think it is necessary to know exactly what sources 
> the text is claimed to be plagiarized from and/or what parts of the text that 
> are being matched by Urkund. If it turns out that Urkund is generating false 
> positives, then this needs to be pointed out to them and to the people basing 
> decisions on it.
>
> -pd
>
>> On 22 Sep 2015, at 18:24 , Marc Schwartz  wrote:
>>
>> Hi,
>>
>> With the usual caveat that I Am Not A Lawyerand that I am not speaking 
>> on behalf of any organization...
>>
>> My guess is that they are claiming that the output of R, simply being copied 
>> and pasted verbatim into your thesis constitutes the use of copyrighted 
>> output from the software.
>>
>> It is not clear to me that R's output is copyrighted by the R Foundation (or 
>> by other parties for CRAN packages), albeit, the source code underlying R 
>> is, along with other copyright owner's as apropos. There is some caselaw to 
>> support the notion that the output alone is not protected in a similar 
>> manner, but that may be country specific.
>>
>> Did you provide any credit to R (see the output of citation() ) in your 
>> thesis and indicate that your analyses were performed using R?
>>
>> If R is uncredited, I could see them raising the issue.
>>
>> You might check with your institution's legal/policy folks to see if there 
>> is any guidance provided for students regarding the crediting of software 
>> used in this manner, especially if that guidance is at no cost to you.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
>>>
>>> 1. It is highly unlikely that we could be of help (unless someone else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised the
>>> flags.
>>>
>>> 2. But of course, the regression methodology is not "your own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>>  -- Clifford Stoll
>>>
>>>
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>>  wrote:

 Dear 'R' community support,


 I am a student at Skema business school and I have recently submitted my 
 MSc thesis/dissertation. This has been passed on to an external plagiarism 
 service provider, Urkund, who have scanned my document and returned a 
 plagiarism report to my professor having detected 32% plagiarism.


 I have contacted Urkund regarding this issue having committed no such 
 plagiarism and they have told me that all the plagiarism detected in my 
 document comes from the last 25% which consists only of 'R' regressions 
 like the one I have pasted below:

 lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
   Fed.t.4., data = OLS_CAR, x = TRUE)

 Residuals:
 Min1QMedian3Q   Max
 -0.154587 -0.015961  0.001429  0.017196  0.110907

 Coefficients:
Estimate Std. Error t value Pr(>|t|)
 (Intercept) -0.001630   0.001763  -0.925   0.3559
 Fed -0.121595   0.165359 

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread Marc Schwartz
Peter,

Great distinction. 

I was leaning in the direction that the "look and feel" of the output (standard 
wording, table structure, column headings, significance stars and so forth in 
the output) is similar to whatever Urkund is using as the basis for the 
comparison and less so on an exact replication (covariates, coefficients, 
etc.), or nearly so, of prior work.

Thanks,

Marc


> On Sep 22, 2015, at 3:06 PM, peter dalgaard  wrote:
> 
> Marc,
> 
> I don't think Copyright/Intellectual property issues factor into this. Urkund 
> and similar tools are to my knowledge entirely about plagiarism. So the issue 
> would seem to be that the R output is considered identical or nearly 
> indentical to R output in other published orotherwise  submitted material.
> 
> What puzzles me (except for how a document can be deemed 32% plagiarized in 
> 25% of the text) is whether this includes the numbers and variable names. If 
> those are somehow factored out, then any R regression could be pretty much 
> identical to any other R regression. However, two analyses with similar 
> variable names could happen if they are based on the same cookbook recipe and 
> analyses with similar numerical output come from analyzing the same standard 
> data. Such situations would not necessarily be considered plagiarism (I mean: 
> If you claim that you are analyzing data from experiments that you yourself 
> have performed, and your numbers are exactly identical to something that has 
> been previously published, then it would be suspect. If you analyze something 
> from public sources, someone else might well have done the same thing.). 
> 
> Similarly to John Kane, I think it is necessary to know exactly what sources 
> the text is claimed to be plagiarized from and/or what parts of the text that 
> are being matched by Urkund. If it turns out that Urkund is generating false 
> positives, then this needs to be pointed out to them and to the people basing 
> decisions on it.
> 
> -pd
> 
>> On 22 Sep 2015, at 18:24 , Marc Schwartz  wrote:
>> 
>> Hi,
>> 
>> With the usual caveat that I Am Not A Lawyerand that I am not speaking 
>> on behalf of any organization...
>> 
>> My guess is that they are claiming that the output of R, simply being copied 
>> and pasted verbatim into your thesis constitutes the use of copyrighted 
>> output from the software.
>> 
>> It is not clear to me that R's output is copyrighted by the R Foundation (or 
>> by other parties for CRAN packages), albeit, the source code underlying R 
>> is, along with other copyright owner's as apropos. There is some caselaw to 
>> support the notion that the output alone is not protected in a similar 
>> manner, but that may be country specific.
>> 
>> Did you provide any credit to R (see the output of citation() ) in your 
>> thesis and indicate that your analyses were performed using R?
>> 
>> If R is uncredited, I could see them raising the issue.
>> 
>> You might check with your institution's legal/policy folks to see if there 
>> is any guidance provided for students regarding the crediting of software 
>> used in this manner, especially if that guidance is at no cost to you.
>> 
>> Regards,
>> 
>> Marc Schwartz
>> 
>> 
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
>>> 
>>> 1. It is highly unlikely that we could be of help (unless someone else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised the
>>> flags.
>>> 
>>> 2. But of course, the regression methodology is not "your own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>> 
>>> Cheers,
>>> Bert
>>> 
>>> 
>>> Bert Gunter
>>> 
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>> -- Clifford Stoll
>>> 
>>> 
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>>  wrote:
 
 Dear 'R' community support,
 
 
 I am a student at Skema business school and I have recently submitted my 
 MSc thesis/dissertation. This has been passed on to an external plagiarism 
 service provider, Urkund, who have scanned my document and returned a 
 plagiarism report to my professor having detected 32% plagiarism.
 
 
 I have contacted Urkund regarding this issue having committed no such 
 plagiarism and they have told me that all the plagiarism detected in my 
 document comes from the last 25% which consists only of 'R' regressions 
 like the one I have pasted below:
 
 lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
  Fed.t.4., data = OLS_CAR, x = TRUE)
 
 Residuals:
Min1QMedian3Q   Max
 -0.154587 -0.015961  0.001429  0.017196  0.110907
 
 Coefficients:
   Estimate Std. Error t value Pr(>|t|)

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread BARRETT, Oliver
Hi all,

Thank you so much for your input.

Just to clarify, of the 32% plagiarism detected only 27-8% has come from the 
regressions but this is expected as the appendix where the regressions are 
contained is much more dense with text and numbers than the rest of the 
document.

The other 5% will be my quotations and references but that's normal,

Thanks again, I will be sharing your thoughts with my thesis supervisor.

Cheers,

Oliver


From: Marc Schwartz <marc_schwa...@me.com>
Sent: 22 September 2015 22:27
To: peter dalgaard
Cc: Bert Gunter; BARRETT, Oliver; R-help
Subject: Re: [R] 'R' Software Output Plagiarism

Peter,

Great distinction.

I was leaning in the direction that the "look and feel" of the output (standard 
wording, table structure, column headings, significance stars and so forth in 
the output) is similar to whatever Urkund is using as the basis for the 
comparison and less so on an exact replication (covariates, coefficients, 
etc.), or nearly so, of prior work.

Thanks,

Marc


> On Sep 22, 2015, at 3:06 PM, peter dalgaard <pda...@gmail.com> wrote:
>
> Marc,
>
> I don't think Copyright/Intellectual property issues factor into this. Urkund 
> and similar tools are to my knowledge entirely about plagiarism. So the issue 
> would seem to be that the R output is considered identical or nearly 
> indentical to R output in other published orotherwise  submitted material.
>
> What puzzles me (except for how a document can be deemed 32% plagiarized in 
> 25% of the text) is whether this includes the numbers and variable names. If 
> those are somehow factored out, then any R regression could be pretty much 
> identical to any other R regression. However, two analyses with similar 
> variable names could happen if they are based on the same cookbook recipe and 
> analyses with similar numerical output come from analyzing the same standard 
> data. Such situations would not necessarily be considered plagiarism (I mean: 
> If you claim that you are analyzing data from experiments that you yourself 
> have performed, and your numbers are exactly identical to something that has 
> been previously published, then it would be suspect. If you analyze something 
> from public sources, someone else might well have done the same thing.).
>
> Similarly to John Kane, I think it is necessary to know exactly what sources 
> the text is claimed to be plagiarized from and/or what parts of the text that 
> are being matched by Urkund. If it turns out that Urkund is generating false 
> positives, then this needs to be pointed out to them and to the people basing 
> decisions on it.
>
> -pd
>
>> On 22 Sep 2015, at 18:24 , Marc Schwartz <marc_schwa...@me.com> wrote:
>>
>> Hi,
>>
>> With the usual caveat that I Am Not A Lawyerand that I am not speaking 
>> on behalf of any organization...
>>
>> My guess is that they are claiming that the output of R, simply being copied 
>> and pasted verbatim into your thesis constitutes the use of copyrighted 
>> output from the software.
>>
>> It is not clear to me that R's output is copyrighted by the R Foundation (or 
>> by other parties for CRAN packages), albeit, the source code underlying R 
>> is, along with other copyright owner's as apropos. There is some caselaw to 
>> support the notion that the output alone is not protected in a similar 
>> manner, but that may be country specific.
>>
>> Did you provide any credit to R (see the output of citation() ) in your 
>> thesis and indicate that your analyses were performed using R?
>>
>> If R is uncredited, I could see them raising the issue.
>>
>> You might check with your institution's legal/policy folks to see if there 
>> is any guidance provided for students regarding the crediting of software 
>> used in this manner, especially if that guidance is at no cost to you.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4...@gmail.com> wrote:
>>>
>>> 1. It is highly unlikely that we could be of help (unless someone else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised the
>>> flags.
>>>
>>> 2. But of course, the regression methodology is not "your own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And

Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread Duncan Murdoch
On 22/09/2015 4:06 PM, peter dalgaard wrote:
> Marc,
> 
> I don't think Copyright/Intellectual property issues factor into this. Urkund 
> and similar tools are to my knowledge entirely about plagiarism. So the issue 
> would seem to be that the R output is considered identical or nearly 
> indentical to R output in other published orotherwise  submitted material.
> 
> What puzzles me (except for how a document can be deemed 32% plagiarized in 
> 25% of the text) is whether this includes the numbers and variable names. If 
> those are somehow factored out, then any R regression could be pretty much 
> identical to any other R regression. However, two analyses with similar 
> variable names could happen if they are based on the same cookbook recipe and 
> analyses with similar numerical output come from analyzing the same standard 
> data. Such situations would not necessarily be considered plagiarism (I mean: 
> If you claim that you are analyzing data from experiments that you yourself 
> have performed, and your numbers are exactly identical to something that has 
> been previously published, then it would be suspect. If you analyze something 
> from public sources, someone else might well have done the same thing.).

I don't see why this puzzles you.  A simple explanation is that Urkund
is incompetent.

Many companies that sell software to university administrations are
incompetent, because the buyers have been promoted so far beyond their
competence that they'll buy anything if it is expensive enough.

This isn't uncommon.

Duncan Murdoch

> 
> Similarly to John Kane, I think it is necessary to know exactly what sources 
> the text is claimed to be plagiarized from and/or what parts of the text that 
> are being matched by Urkund. If it turns out that Urkund is generating false 
> positives, then this needs to be pointed out to them and to the people basing 
> decisions on it.
> 
> -pd
> 
>> On 22 Sep 2015, at 18:24 , Marc Schwartz  wrote:
>>
>> Hi,
>>
>> With the usual caveat that I Am Not A Lawyerand that I am not speaking 
>> on behalf of any organization...
>>
>> My guess is that they are claiming that the output of R, simply being copied 
>> and pasted verbatim into your thesis constitutes the use of copyrighted 
>> output from the software.
>>
>> It is not clear to me that R's output is copyrighted by the R Foundation (or 
>> by other parties for CRAN packages), albeit, the source code underlying R 
>> is, along with other copyright owner's as apropos. There is some caselaw to 
>> support the notion that the output alone is not protected in a similar 
>> manner, but that may be country specific.
>>
>> Did you provide any credit to R (see the output of citation() ) in your 
>> thesis and indicate that your analyses were performed using R?
>>
>> If R is uncredited, I could see them raising the issue.
>>
>> You might check with your institution's legal/policy folks to see if there 
>> is any guidance provided for students regarding the crediting of software 
>> used in this manner, especially if that guidance is at no cost to you.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter  wrote:
>>>
>>> 1. It is highly unlikely that we could be of help (unless someone else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised the
>>> flags.
>>>
>>> 2. But of course, the regression methodology is not "your own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>>  -- Clifford Stoll
>>>
>>>
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>>  wrote:

 Dear 'R' community support,


 I am a student at Skema business school and I have recently submitted my 
 MSc thesis/dissertation. This has been passed on to an external plagiarism 
 service provider, Urkund, who have scanned my document and returned a 
 plagiarism report to my professor having detected 32% plagiarism.


 I have contacted Urkund regarding this issue having committed no such 
 plagiarism and they have told me that all the plagiarism detected in my 
 document comes from the last 25% which consists only of 'R' regressions 
 like the one I have pasted below:

 lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
   Fed.t.4., data = OLS_CAR, x = TRUE)

 Residuals:
 Min1QMedian3Q   Max
 -0.154587 -0.015961  0.001429  0.017196  0.110907

 Coefficients:
Estimate Std. Error t value Pr(>|t|)
 (Intercept) -0.001630   0.001763  -0.925   0.3559
 Fed -0.121595   0.165359  -0.735   0.4627

[R] 'R' Software Output Plagiarism

2015-09-22 Thread BARRETT, Oliver

Dear 'R' community support,


I am a student at Skema business school and I have recently submitted my MSc 
thesis/dissertation. This has been passed on to an external plagiarism service 
provider, Urkund, who have scanned my document and returned a plagiarism report 
to my professor having detected 32% plagiarism.


I have contacted Urkund regarding this issue having committed no such 
plagiarism and they have told me that all the plagiarism detected in my 
document comes from the last 25% which consists only of 'R' regressions like 
the one I have pasted below:

lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
Fed.t.4., data = OLS_CAR, x = TRUE)

Residuals:
  Min1QMedian3Q   Max
-0.154587 -0.015961  0.001429  0.017196  0.110907

Coefficients:
 Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.001630   0.001763  -0.925   0.3559
Fed -0.121595   0.165359  -0.735   0.4627
Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
Fed.t.2. 0.026529   0.143648   0.185   0.8536
Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0293 on 304 degrees of freedom
  (20 observations deleted due to missingness)
Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05

I have produced all of these regressions myself and pasted them directly from 
the 'R' software package. My regression methodology is entirely my own along 
with the sourcing and preperation of the data used to produce these statistics.

I would be very grateful if you could provide my with some clarity as to why 
this output from 'R' is reading as plagiarism.

I would like to thank you in advance,

Kind regards,

Oliver Barrett
(+44) 7341 834 217

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'R' Software Output Plagiarism

2015-09-22 Thread Bert Gunter
1. It is highly unlikely that we could be of help (unless someone else
has experienced this and knows what happened). You will have to
contact the Urkund people and ask them why their algorithms raised the
flags.

2. But of course, the regression methodology is not "your own" -- it's
just a standard tool that you used in your work, which is entirely
legitimate of course.

Cheers,
Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
 wrote:
>
> Dear 'R' community support,
>
>
> I am a student at Skema business school and I have recently submitted my MSc 
> thesis/dissertation. This has been passed on to an external plagiarism 
> service provider, Urkund, who have scanned my document and returned a 
> plagiarism report to my professor having detected 32% plagiarism.
>
>
> I have contacted Urkund regarding this issue having committed no such 
> plagiarism and they have told me that all the plagiarism detected in my 
> document comes from the last 25% which consists only of 'R' regressions like 
> the one I have pasted below:
>
> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
> Fed.t.4., data = OLS_CAR, x = TRUE)
>
> Residuals:
>   Min1QMedian3Q   Max
> -0.154587 -0.015961  0.001429  0.017196  0.110907
>
> Coefficients:
>  Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.001630   0.001763  -0.925   0.3559
> Fed -0.121595   0.165359  -0.735   0.4627
> Fed.t.1. 0.344014   0.140979   2.440   0.0153 *
> Fed.t.2. 0.026529   0.143648   0.185   0.8536
> Fed.t.3. 0.622357   0.142021   4.382 1.62e-05 ***
> Fed.t.4. 0.291985   0.158914   1.837   0.0671 .
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0.0293 on 304 degrees of freedom
>   (20 observations deleted due to missingness)
> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>
> I have produced all of these regressions myself and pasted them directly from 
> the 'R' software package. My regression methodology is entirely my own along 
> with the sourcing and preperation of the data used to produce these 
> statistics.
>
> I would be very grateful if you could provide my with some clarity as to why 
> this output from 'R' is reading as plagiarism.
>
> I would like to thank you in advance,
>
> Kind regards,
>
> Oliver Barrett
> (+44) 7341 834 217
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.