Re: [Rd] lm() gives different results to lm.ridge() and SPSS

2017-05-05 Thread Nick Brown
Hi John, 

Thanks for the comment... but that appears to mean that SPSS has a big problem. 
I have always been told that to include an interaction term in a regression, 
the only way is to do the multiplication by hand. But then it seems to be 
impossible to stop SPSS from re-standardizing the variable that corresponds to 
the interaction term. Am I missing something? Is there a way to perform the 
regression with the interaction in SPSS without computing the interaction as a 
separate variable? 

Best, 
Nick 

- Original Message -

From: "John Fox"  
To: "Nick Brown" , "peter dalgaard"  
Cc: r-devel@r-project.org 
Sent: Friday, 5 May, 2017 8:22:53 PM 
Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS 

Dear Nick, 


On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown" 
 wrote: 

>>I conjecture that something in the vicinity of 
>> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + 
>>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) 
>>summary(res) 
>> would reproduce the SPSS Beta values. 
> 
>Yes, that works. Thanks! 

That you have to work hard in R to match the SPSS results isn’t such a bad 
thing when you factor in the observation that standardizing the 
interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of 
its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense. 

Best, 
John 

- 
John Fox, Professor 
McMaster University 
Hamilton, Ontario, Canada 
Web: http://socserv.mcmaster.ca/jfox/ 


> 
> 
>- Original Message - 
> 
>From: "peter dalgaard"  
>To: "Viechtbauer Wolfgang (SP)" 
>, "Nick Brown" 
> 
>Cc: r-devel@r-project.org 
>Sent: Friday, 5 May, 2017 3:33:29 PM 
>Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
>Thanks, I was getting to try this, but got side tracked by actual work...
> 
>Your analysis reproduces the SPSS unscaled estimates. It still remains to
>figure out how Nick got 
> 
>> 
>coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) 
> 
>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
>0.07342198 -0.39650356 -0.36569488 -0.09435788 
> 
> 
>which does not match your output. I suspect that ZMEAN_PA and 
>ZDIVERSITY_PA were scaled for this analysis (but the interaction term 
>still obviously is not). I conjecture that something in the vicinity of 
> 
>res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + 
>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) 
>summary(res) 
> 
>would reproduce the SPSS Beta values. 
> 
> 
>> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP) 
>> wrote: 
>> 
>> I had no problems running regression models in SPSS and R that yielded 
>>the same results for these data. 
>> 
>> The difference you are observing is from fitting different models. In 
>>R, you fitted: 
>> 
>> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat) 
>> summary(res) 
>> 
>> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This
>>is not a standardized variable itself and not the same as "ZINTER_PA_C" 
>>in the png you showed, which is not a variable in the dataset, but can 
>>be created with: 
>> 
>> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA)) 
>> 
>> If you want the same results as in SPSS, then you need to fit: 
>> 
>> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, 
>>data=dat) 
>> summary(res) 
>> 
>> This yields: 
>> 
>> Coefficients: 
>> Estimate Std. Error t value Pr(>|t|) 
>> (Intercept) 6.41041 0.01722 372.21 <2e-16 *** 
>> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 *** 
>> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 *** 
>> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 *** 
>> --- 
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
>> 
>> Exactly the same as in the png. 
>> 
>> Peter already mentioned this as a possible reason for the discrepancy: 
>>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it 
>>perhaps the case that x1 and x2 have already been scaled to have 
>>standard deviation 1? In that case, x1*x2 won't be.") 
>> 
>> Best, 
>> Wolfgang 
>> 
>> -Original Message- 
>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick 
>>Brown 
>> Sent: Friday, May 05, 2017 10:40 
>> To: peter dalgaard 
>> Cc: r-devel@r-project.org 
>> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS 
>> 
>> Hi, 
>> 
>> Here is (I hope) all the relevant output from R. 
>> 
>>> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > 
>>>mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA,
>>>na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA * 
>>>ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA 
>>>ZMEAN_PA:ZDIVERSITY_PA 
>> -0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the 
>>problem originally. :-) 
>> 
>> 
>>> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) 
>>>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
>> 0.07342198 -0.39650356 -0.3656948

Re: [Rd] lm() gives different results to lm.ridge() and SPSS

2017-05-05 Thread Viechtbauer Wolfgang (SP)
Well, one correction -- the 'standardized coefficients' that SPSS shows are 
based on standardizing all variables separately (so x1, x2, and x1*x2 are all 
standardized). So with respect to that, the criticism certainly stands.

-Original Message-
From: Viechtbauer Wolfgang (SP) 
Sent: Friday, May 05, 2017 22:35
To: r-devel@r-project.org
Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS

Totally agree that standardizing the interaction term is nonsense. But in all 
fairness, SPSS doesn't do that. In fact, the 'REGRESSION' command in SPSS 
doesn't compute any interaction terms -- one has to first compute them 'by 
hand' and then add them to the model like any other variable. So somebody 
worked extra hard to standardize that interaction term in SPSS as well :/

Best,
Wolfgang

-Original Message-
From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Fox, John
Sent: Friday, May 05, 2017 20:23
To: Nick Brown; peter dalgaard
Cc: r-devel@r-project.org
Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS

Dear Nick,

On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown"
 wrote:

>>I conjecture that something in the vicinity of
>> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) +
>>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat)
>>summary(res) 
>> would reproduce the SPSS Beta values.
>
>Yes, that works. Thanks!

That you have to work hard in R to match the SPSS results isn’t such a bad
thing when you factor in the observation that standardizing the
interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of
its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense.

Best,
 John

-
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
Web: http://socserv.mcmaster.ca/jfox/

>- Original Message -
>
>From: "peter dalgaard" 
>To: "Viechtbauer Wolfgang (SP)"
>, "Nick Brown"
>
>Cc: r-devel@r-project.org
>Sent: Friday, 5 May, 2017 3:33:29 PM
>Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
>
>Thanks, I was getting to try this, but got side tracked by actual work...
>
>Your analysis reproduces the SPSS unscaled estimates. It still remains to
>figure out how Nick got
>
>> 
>coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
>
>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
>0.07342198 -0.39650356 -0.36569488 -0.09435788
>
>
>which does not match your output. I suspect that ZMEAN_PA and
>ZDIVERSITY_PA were scaled for this analysis (but the interaction term
>still obviously is not). I conjecture that something in the vicinity of
>
>res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) +
>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat)
>summary(res) 
>
>would reproduce the SPSS Beta values.
>
>> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP)
>> wrote:
>> 
>> I had no problems running regression models in SPSS and R that yielded
>>the same results for these data.
>> 
>> The difference you are observing is from fitting different models. In
>>R, you fitted: 
>> 
>> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat)
>> summary(res) 
>> 
>> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This
>>is not a standardized variable itself and not the same as "ZINTER_PA_C"
>>in the png you showed, which is not a variable in the dataset, but can
>>be created with: 
>> 
>> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA))
>> 
>> If you want the same results as in SPSS, then you need to fit:
>> 
>> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C,
>>data=dat) 
>> summary(res) 
>> 
>> This yields: 
>> 
>> Coefficients: 
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 6.41041 0.01722 372.21 <2e-16 ***
>> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 ***
>> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 ***
>> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 ***
>> --- 
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> 
>> Exactly the same as in the png.
>> 
>> Peter already mentioned this as a possible reason for the discrepancy:
>>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it
>>perhaps the case that x1 and x2 have already been scaled to have
>>standard deviation 1? In that case, x1*x2 won't be.")
>> 
>> Best, 
>> Wolfgang 
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] lm() gives different results to lm.ridge() and SPSS

2017-05-05 Thread Viechtbauer Wolfgang (SP)
Totally agree that standardizing the interaction term is nonsense. But in all 
fairness, SPSS doesn't do that. In fact, the 'REGRESSION' command in SPSS 
doesn't compute any interaction terms -- one has to first compute them 'by 
hand' and then add them to the model like any other variable. So somebody 
worked extra hard to standardize that interaction term in SPSS as well :/

Best,
Wolfgang

-Original Message-
From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Fox, John
Sent: Friday, May 05, 2017 20:23
To: Nick Brown; peter dalgaard
Cc: r-devel@r-project.org
Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS

Dear Nick,

On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown"
 wrote:

>>I conjecture that something in the vicinity of
>> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) +
>>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat)
>>summary(res) 
>> would reproduce the SPSS Beta values.
>
>Yes, that works. Thanks!

That you have to work hard in R to match the SPSS results isn’t such a bad
thing when you factor in the observation that standardizing the
interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of
its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense.

Best,
 John

-
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
Web: http://socserv.mcmaster.ca/jfox/

>- Original Message -
>
>From: "peter dalgaard" 
>To: "Viechtbauer Wolfgang (SP)"
>, "Nick Brown"
>
>Cc: r-devel@r-project.org
>Sent: Friday, 5 May, 2017 3:33:29 PM
>Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
>
>Thanks, I was getting to try this, but got side tracked by actual work...
>
>Your analysis reproduces the SPSS unscaled estimates. It still remains to
>figure out how Nick got
>
>> 
>coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
>
>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
>0.07342198 -0.39650356 -0.36569488 -0.09435788
>
>
>which does not match your output. I suspect that ZMEAN_PA and
>ZDIVERSITY_PA were scaled for this analysis (but the interaction term
>still obviously is not). I conjecture that something in the vicinity of
>
>res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) +
>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat)
>summary(res) 
>
>would reproduce the SPSS Beta values.
>
>> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP)
>> wrote:
>> 
>> I had no problems running regression models in SPSS and R that yielded
>>the same results for these data.
>> 
>> The difference you are observing is from fitting different models. In
>>R, you fitted: 
>> 
>> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat)
>> summary(res) 
>> 
>> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This
>>is not a standardized variable itself and not the same as "ZINTER_PA_C"
>>in the png you showed, which is not a variable in the dataset, but can
>>be created with: 
>> 
>> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA))
>> 
>> If you want the same results as in SPSS, then you need to fit:
>> 
>> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C,
>>data=dat) 
>> summary(res) 
>> 
>> This yields: 
>> 
>> Coefficients: 
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 6.41041 0.01722 372.21 <2e-16 ***
>> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 ***
>> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 ***
>> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 ***
>> --- 
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> 
>> Exactly the same as in the png.
>> 
>> Peter already mentioned this as a possible reason for the discrepancy:
>>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it
>>perhaps the case that x1 and x2 have already been scaled to have
>>standard deviation 1? In that case, x1*x2 won't be.")
>> 
>> Best, 
>> Wolfgang 
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A few suggestions and perspectives from a PhD student

2017-05-05 Thread Gabor Grothendieck
Regarding the anonymous-function-in-a-pipeline point one can already
do this which does use brackets but even so it involves fewer
characters than the example shown.  Here { . * 2 } is basically a
lambda whose argument is dot. Would this be sufficient?

  library(magrittr)

  1.5 %>% { . * 2 }
  ## [1] 3

Regarding currying note that with magrittr Ista's code could be written as:

  1:5 %>% lapply(foo, y = 3)

or at the expense of slightly more verbosity:

  1:5 %>% Map(f = . %>% foo(y = 3))


On Fri, May 5, 2017 at 1:00 PM, Antonin Klima  wrote:
> Dear Sir or Madam,
>
> I am in 2nd year of my PhD in bioinformatics, after taking my Master’s in 
> computer science, and have been using R heavily during my PhD. As such, I 
> have put together a list of certain features in R that, in my opinion, would 
> be beneficial to add, or could be improved. The first two are already 
> implemented in packages, but given that it is implemented as user-defined 
> operators, it greatly restricts its usefulness. I hope you will find my 
> suggestions interesting. If you find time, I will welcome any feedback as to 
> whether you find the suggestions useful, or why you do not think they should 
> be implemented. I will also welcome if you enlighten me with any features I 
> might be unaware of, that might solve the issues I have pointed out below.
>
> 1) piping
> Currently available in package magrittr, piping makes the code better 
> readable by having the line start at its natural starting point, and 
> following with functions that are applied - in order. The readability of 
> several nested calls with a number of parameters each is almost zero, it’s 
> almost as if one would need to come up with the solution himself. Pipeline in 
> comparison is very straightforward, especially together with the point (2).
>
> The package here works rather good nevertheless, the shortcomings of piping 
> not being native are not quite as severe as in point (2). Nevertheless, an 
> intuitive symbol such as | would be helpful, and it sometimes bothers me that 
> I have to parenthesize anonymous function, which would probably not be 
> required in a native pipe-operator, much like it is not required in f.ex. 
> lapply. That is,
> 1:5 %>% function(x) x+2
> should be totally fine
>
> 2) currying
> Currently available in package Curry. The idea is that, having a function 
> such as foo = function(x, y) x+y, one would like to write for example 
> lapply(foo(3), 1:5), and have the interpreter figure out ok, foo(3) does not 
> make a value result, but it can still give a function result - a function of 
> y. This would be indeed most useful for various apply functions, rather than 
> writing function(x) foo(3,x).
>
> I suggest that currying would make the code easier to write, and more 
> readable, especially when using apply functions. One might imagine that there 
> could be some confusion with such a feature, especially from people 
> unfamiliar with functional programming, although R already does take function 
> as first-order arguments, so it could be just fine. But one could address it 
> with special syntax, such as $foo(3) [$foo(x=3)] for partial application.  
> The current currying package has very limited usefulness, as, being limited 
> by the user-defined operator framework, it only rarely can contribute to less 
> code/more readability. Compare yourself:
> $foo(x=3) vs foo %<% 3
> goo = function(a,b,c)
> $goo(b=3) vs goo %><% list(b=3)
>
> Moreover, one would often like currying to have highest priority. For 
> example, when piping:
> data %>% foo %>% foo1 %<% 3
> if one wants to do data %>% foo %>% $foo(x=3)
>
> 3) Code executable only when running the script itself
> Whereas the first two suggestions are somewhat stealing from Haskell and the 
> like, this suggestion would be stealing from Python. I’m building quite a 
> complicated pipeline, using S4 classes. After defining the class and its 
> methods, I also define how to build the class to my likings, based on my 
> input data, using various now-defined methods. So I end up having a list of 
> command line arguments to process, and the way to create the class instance 
> based on them. If I write it to the class file, however, I end up running the 
> code when it is sourced from the next step in the pipeline, that needs the 
> previous class definitions.
>
> A feature such as pythonic “if __name__ == __main__” would thus be useful. As 
> it is, I had to create run scripts as separate files. Which is actually not 
> so terrible, given the class and its methods often span a few hundred lines, 
> but still.
>
> 4) non-exported global variables
> I also find it lacking, that I seem to be unable to create constants that 
> would not get passed to files that source the class definition. That is, if 
> class1 features global constant CONSTANT=3, then if class2 sources class1, it 
> will also include the constant. This 1) clutters the namespace when running 
> the code interactively,

Re: [Rd] lm() gives different results to lm.ridge() and SPSS

2017-05-05 Thread Fox, John
Dear Nick,


On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown"
 wrote:

>>I conjecture that something in the vicinity of
>> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) +
>>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat)
>>summary(res) 
>> would reproduce the SPSS Beta values.
>
>Yes, that works. Thanks!

That you have to work hard in R to match the SPSS results isn’t such a bad
thing when you factor in the observation that standardizing the
interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of
its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense.

Best,
 John

-
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
Web: http://socserv.mcmaster.ca/jfox/


> 
>
>- Original Message -
>
>From: "peter dalgaard" 
>To: "Viechtbauer Wolfgang (SP)"
>, "Nick Brown"
>
>Cc: r-devel@r-project.org
>Sent: Friday, 5 May, 2017 3:33:29 PM
>Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
>
>Thanks, I was getting to try this, but got side tracked by actual work...
>
>Your analysis reproduces the SPSS unscaled estimates. It still remains to
>figure out how Nick got
>
>> 
>coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
>
>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
>0.07342198 -0.39650356 -0.36569488 -0.09435788
>
>
>which does not match your output. I suspect that ZMEAN_PA and
>ZDIVERSITY_PA were scaled for this analysis (but the interaction term
>still obviously is not). I conjecture that something in the vicinity of
>
>res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) +
>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat)
>summary(res) 
>
>would reproduce the SPSS Beta values.
>
>
>> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP)
>> wrote:
>> 
>> I had no problems running regression models in SPSS and R that yielded
>>the same results for these data.
>> 
>> The difference you are observing is from fitting different models. In
>>R, you fitted: 
>> 
>> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat)
>> summary(res) 
>> 
>> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This
>>is not a standardized variable itself and not the same as "ZINTER_PA_C"
>>in the png you showed, which is not a variable in the dataset, but can
>>be created with: 
>> 
>> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA))
>> 
>> If you want the same results as in SPSS, then you need to fit:
>> 
>> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C,
>>data=dat) 
>> summary(res) 
>> 
>> This yields: 
>> 
>> Coefficients: 
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 6.41041 0.01722 372.21 <2e-16 ***
>> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 ***
>> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 ***
>> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 ***
>> --- 
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> 
>> Exactly the same as in the png.
>> 
>> Peter already mentioned this as a possible reason for the discrepancy:
>>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it
>>perhaps the case that x1 and x2 have already been scaled to have
>>standard deviation 1? In that case, x1*x2 won't be.")
>> 
>> Best, 
>> Wolfgang 
>> 
>> -Original Message-
>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick
>>Brown 
>> Sent: Friday, May 05, 2017 10:40
>> To: peter dalgaard
>> Cc: r-devel@r-project.org
>> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
>> 
>> Hi, 
>> 
>> Here is (I hope) all the relevant output from R.
>> 
>>> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 >
>>>mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA,
>>>na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA *
>>>ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA
>>>ZMEAN_PA:ZDIVERSITY_PA
>> -0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the
>>problem originally. :-)
>> 
>> 
>>> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
>>>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
>> 0.07342198 -0.39650356 -0.36569488 -0.09435788 >
>>coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
>>ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
>> 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS
>>is attached. The unstandardized coefficients in SPSS look nothing like
>>those in R. The standardized coefficients in SPSS match the
>>lm.ridge()$coef numbers very closely indeed, suggesting that the same
>>algorithm may be in use.
>> 
>> I have put the dataset file, which is the untouched original I received
>>from the authors, in this Dropbox folder:
>>https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0
>>. You can read it into R with this code (one variable needs to be
>>standardized and centered; everything else is already in the file):
>> 
>> s1 <- read.csv("Emodiversity_Study1.

Re: [Rd] A few suggestions and perspectives from a PhD student

2017-05-05 Thread Ista Zahn
On Fri, May 5, 2017 at 1:00 PM, Antonin Klima  wrote:
> Dear Sir or Madam,
>
> I am in 2nd year of my PhD in bioinformatics, after taking my Master’s in 
> computer science, and have been using R heavily during my PhD. As such, I 
> have put together a list of certain features in R that, in my opinion, would 
> be beneficial to add, or could be improved. The first two are already 
> implemented in packages, but given that it is implemented as user-defined 
> operators, it greatly restricts its usefulness.

Why do you think being implemented in a contributed package restricts
the usefulness of a feature?

I hope you will find my suggestions interesting. If you find time, I
will welcome any feedback as to whether you find the suggestions
useful, or why you do not think they should be implemented. I will
also welcome if you enlighten me with any features I might be unaware
of, that might solve the issues I have pointed out below.
>
> 1) piping
> Currently available in package magrittr, piping makes the code better 
> readable by having the line start at its natural starting point, and 
> following with functions that are applied - in order. The readability of 
> several nested calls with a number of parameters each is almost zero, it’s 
> almost as if one would need to come up with the solution himself. Pipeline in 
> comparison is very straightforward, especially together with the point (2).

You may be surprised to learn that not everyone thinks pipes are a
good idea. Personally I see some advantages, but there is also a big
downside with is that they mess up the call stack and make tracking
down errors via traceback() more difficult.

There is a simple alternative to pipes already built in to R that
gives you some of the advantages of %>% without messing up the call
stack.  Using Hadley's famous "little bunny foo foo" example:

foo_foo <- little_bunny()

## nesting (it is rough)
bop(
  scoop(
hop(foo_foo, through = forest),
up = field_mice
  ),
  on = head
)

## magrittr
foo_foo %>%
  hop(through = forest) %>%
  scoop(up = field_mouse) %>%
  bop(on = head)

## regular R assignment
foo_foo -> .
  hop(., through = forest) -> .
  scoop(., up = field_mouse) -> .
  bop(., on = head)

This is more limited that magrittr's %>%, but it gives you a lot of
the advantages without the disadvantages.

>
> The package here works rather good nevertheless, the shortcomings of piping 
> not being native are not quite as severe as in point (2). Nevertheless, an 
> intuitive symbol such as | would be helpful, and it sometimes bothers me that 
> I have to parenthesize anonymous function, which would probably not be 
> required in a native pipe-operator, much like it is not required in f.ex. 
> lapply. That is,
> 1:5 %>% function(x) x+2
> should be totally fine

That seems pretty small-potatoes to me.

>
> 2) currying
> Currently available in package Curry. The idea is that, having a function 
> such as foo = function(x, y) x+y, one would like to write for example 
> lapply(foo(3), 1:5), and have the interpreter figure out ok, foo(3) does not 
> make a value result, but it can still give a function result - a function of 
> y. This would be indeed most useful for various apply functions, rather than 
> writing function(x) foo(3,x).

You can already do

lapply(1:5, foo, y = 3)

(assuming that the first argument to foo is named "y")

I'm stopping here since I don't have anything useful to say about your
subsequent points.

Best,
Ista

>
> I suggest that currying would make the code easier to write, and more 
> readable, especially when using apply functions. One might imagine that there 
> could be some confusion with such a feature, especially from people 
> unfamiliar with functional programming, although R already does take function 
> as first-order arguments, so it could be just fine. But one could address it 
> with special syntax, such as $foo(3) [$foo(x=3)] for partial application.  
> The current currying package has very limited usefulness, as, being limited 
> by the user-defined operator framework, it only rarely can contribute to less 
> code/more readability. Compare yourself:
> $foo(x=3) vs foo %<% 3
> goo = function(a,b,c)
> $goo(b=3) vs goo %><% list(b=3)
>
> Moreover, one would often like currying to have highest priority. For 
> example, when piping:
> data %>% foo %>% foo1 %<% 3
> if one wants to do data %>% foo %>% $foo(x=3)
>
> 3) Code executable only when running the script itself
> Whereas the first two suggestions are somewhat stealing from Haskell and the 
> like, this suggestion would be stealing from Python. I’m building quite a 
> complicated pipeline, using S4 classes. After defining the class and its 
> methods, I also define how to build the class to my likings, based on my 
> input data, using various now-defined methods. So I end up having a list of 
> command line arguments to process, and the way to create the class instance 
> based on them. If I write it to the class file, however, I 

[Rd] A few suggestions and perspectives from a PhD student

2017-05-05 Thread Antonin Klima
Dear Sir or Madam,

I am in 2nd year of my PhD in bioinformatics, after taking my Master’s in 
computer science, and have been using R heavily during my PhD. As such, I have 
put together a list of certain features in R that, in my opinion, would be 
beneficial to add, or could be improved. The first two are already implemented 
in packages, but given that it is implemented as user-defined operators, it 
greatly restricts its usefulness. I hope you will find my suggestions 
interesting. If you find time, I will welcome any feedback as to whether you 
find the suggestions useful, or why you do not think they should be 
implemented. I will also welcome if you enlighten me with any features I might 
be unaware of, that might solve the issues I have pointed out below.

1) piping
Currently available in package magrittr, piping makes the code better readable 
by having the line start at its natural starting point, and following with 
functions that are applied - in order. The readability of several nested calls 
with a number of parameters each is almost zero, it’s almost as if one would 
need to come up with the solution himself. Pipeline in comparison is very 
straightforward, especially together with the point (2).

The package here works rather good nevertheless, the shortcomings of piping not 
being native are not quite as severe as in point (2). Nevertheless, an 
intuitive symbol such as | would be helpful, and it sometimes bothers me that I 
have to parenthesize anonymous function, which would probably not be required 
in a native pipe-operator, much like it is not required in f.ex. lapply. That 
is,
1:5 %>% function(x) x+2
should be totally fine

2) currying
Currently available in package Curry. The idea is that, having a function such 
as foo = function(x, y) x+y, one would like to write for example lapply(foo(3), 
1:5), and have the interpreter figure out ok, foo(3) does not make a value 
result, but it can still give a function result - a function of y. This would 
be indeed most useful for various apply functions, rather than writing 
function(x) foo(3,x).

I suggest that currying would make the code easier to write, and more readable, 
especially when using apply functions. One might imagine that there could be 
some confusion with such a feature, especially from people unfamiliar with 
functional programming, although R already does take function as first-order 
arguments, so it could be just fine. But one could address it with special 
syntax, such as $foo(3) [$foo(x=3)] for partial application.  The current 
currying package has very limited usefulness, as, being limited by the 
user-defined operator framework, it only rarely can contribute to less 
code/more readability. Compare yourself:
$foo(x=3) vs foo %<% 3
goo = function(a,b,c)
$goo(b=3) vs goo %><% list(b=3)

Moreover, one would often like currying to have highest priority. For example, 
when piping:
data %>% foo %>% foo1 %<% 3
if one wants to do data %>% foo %>% $foo(x=3)

3) Code executable only when running the script itself
Whereas the first two suggestions are somewhat stealing from Haskell and the 
like, this suggestion would be stealing from Python. I’m building quite a 
complicated pipeline, using S4 classes. After defining the class and its 
methods, I also define how to build the class to my likings, based on my input 
data, using various now-defined methods. So I end up having a list of command 
line arguments to process, and the way to create the class instance based on 
them. If I write it to the class file, however, I end up running the code when 
it is sourced from the next step in the pipeline, that needs the previous class 
definitions.

A feature such as pythonic “if __name__ == __main__” would thus be useful. As 
it is, I had to create run scripts as separate files. Which is actually not so 
terrible, given the class and its methods often span a few hundred lines, but 
still.

4) non-exported global variables
I also find it lacking, that I seem to be unable to create constants that would 
not get passed to files that source the class definition. That is, if class1 
features global constant CONSTANT=3, then if class2 sources class1, it will 
also include the constant. This 1) clutters the namespace when running the code 
interactively, 2) potentially overwrites the constants in case of nameclash. 
Some kind of export/nonexport variable syntax, or symbolic import, or namespace 
would be useful. I know if I converted it to a package I would get at least 
something like a namespace, but still.

I understand that the variable cannot just not be imported, in general, as the 
functions will generally rely on it (otherwise it wouldn’t have to be there). 
But one could consider hiding it in an implicit namespace for the file, for 
example.

5) S4 methods with same name, for different classes
Say I have an S4 class called datasetSingle, and another S4 class called 
datasetMulti, which gathers up a number of datasetSingle classes, 

Re: [Rd] lm() gives different results to lm.ridge() and SPSS

2017-05-05 Thread Nick Brown
>I conjecture that something in the vicinity of 
> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + 
> scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) 
>summary(res) 
> would reproduce the SPSS Beta values. 

Yes, that works. Thanks! 

- Original Message -

From: "peter dalgaard"  
To: "Viechtbauer Wolfgang (SP)" , 
"Nick Brown"  
Cc: r-devel@r-project.org 
Sent: Friday, 5 May, 2017 3:33:29 PM 
Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS 

Thanks, I was getting to try this, but got side tracked by actual work... 

Your analysis reproduces the SPSS unscaled estimates. It still remains to 
figure out how Nick got 

> 
coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) 

(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
0.07342198 -0.39650356 -0.36569488 -0.09435788 


which does not match your output. I suspect that ZMEAN_PA and ZDIVERSITY_PA 
were scaled for this analysis (but the interaction term still obviously is 
not). I conjecture that something in the vicinity of 

res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + scale(ZMEAN_PA 
* ZDIVERSITY_PA), data=dat) 
summary(res) 

would reproduce the SPSS Beta values. 


> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP) 
>  wrote: 
> 
> I had no problems running regression models in SPSS and R that yielded the 
> same results for these data. 
> 
> The difference you are observing is from fitting different models. In R, you 
> fitted: 
> 
> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat) 
> summary(res) 
> 
> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This is 
> not a standardized variable itself and not the same as "ZINTER_PA_C" in the 
> png you showed, which is not a variable in the dataset, but can be created 
> with: 
> 
> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA)) 
> 
> If you want the same results as in SPSS, then you need to fit: 
> 
> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, data=dat) 
> summary(res) 
> 
> This yields: 
> 
> Coefficients: 
> Estimate Std. Error t value Pr(>|t|) 
> (Intercept) 6.41041 0.01722 372.21 <2e-16 *** 
> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 *** 
> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 *** 
> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 *** 
> --- 
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> 
> Exactly the same as in the png. 
> 
> Peter already mentioned this as a possible reason for the discrepancy: 
> https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it perhaps 
> the case that x1 and x2 have already been scaled to have standard deviation 
> 1? In that case, x1*x2 won't be.") 
> 
> Best, 
> Wolfgang 
> 
> -Original Message- 
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick Brown 
> Sent: Friday, May 05, 2017 10:40 
> To: peter dalgaard 
> Cc: r-devel@r-project.org 
> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
> Hi, 
> 
> Here is (I hope) all the relevant output from R. 
> 
>> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA, 
>> na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15 > 
>> lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA 
>> ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
> -0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the problem 
> originally. :-) 
> 
> 
>> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) 
>> (Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
> 0.07342198 -0.39650356 -0.36569488 -0.09435788 > 
> coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) 
> ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
> 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS is 
> attached. The unstandardized coefficients in SPSS look nothing like those in 
> R. The standardized coefficients in SPSS match the lm.ridge()$coef numbers 
> very closely indeed, suggesting that the same algorithm may be in use. 
> 
> I have put the dataset file, which is the untouched original I received from 
> the authors, in this Dropbox folder: 
> https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. 
> You can read it into R with this code (one variable needs to be standardized 
> and centered; everything else is already in the file): 
> 
> s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) 
> s1$ZDEPRESSION <- scale(s1$DEPRESSION) 
> Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure IBM 
> will want to fix it quickly (ha ha ha). 
> 
> Nick 
> 
> - Original Message - 
> 
> From: "peter dalgaard"  
> To: "Nick Brown"  
> Cc: "Simon Bonner" , r-devel@r-project.org 
> Sent: Friday, 5 May, 2017 10:02:10 AM 
> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
> I asked you before, but in case you missed it: Are you looking at the right 
> place in SP

Re: [Rd] lm() gives different results to lm.ridge() and SPSS

2017-05-05 Thread peter dalgaard
Thanks, I was getting to try this, but got side tracked by actual work...

Your analysis reproduces the SPSS unscaled estimates. It still remains to 
figure out how Nick got

> 
coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))

   (Intercept)   ZMEAN_PA  ZDIVERSITY_PA 
ZMEAN_PA:ZDIVERSITY_PA 
0.07342198-0.39650356-0.36569488
-0.09435788 


which does not match your output. I suspect that ZMEAN_PA and ZDIVERSITY_PA 
were scaled for this analysis (but the interaction term still obviously is 
not). I conjecture that something in the vicinity of

res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + scale(ZMEAN_PA 
* ZDIVERSITY_PA), data=dat)
summary(res)

would reproduce the SPSS Beta values.


> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP) 
>  wrote:
> 
> I had no problems running regression models in SPSS and R that yielded the 
> same results for these data.
> 
> The difference you are observing is from fitting different models. In R, you 
> fitted:
> 
> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat)
> summary(res)
> 
> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This is 
> not a standardized variable itself and not the same as "ZINTER_PA_C" in the 
> png you showed, which is not a variable in the dataset, but can be created 
> with:
> 
> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA))
> 
> If you want the same results as in SPSS, then you need to fit:
> 
> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, data=dat)
> summary(res)
> 
> This yields:
> 
> Coefficients:
>  Estimate Std. Error t value Pr(>|t|)
> (Intercept)6.410410.01722  372.21   <2e-16 ***
> ZMEAN_PA  -1.627260.04200  -38.74   <2e-16 ***
> ZDIVERSITY_PA -1.500820.07447  -20.15   <2e-16 ***
> ZINTER_PA_C   -0.589550.05288  -11.15   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Exactly the same as in the png.
> 
> Peter already mentioned this as a possible reason for the discrepancy: 
> https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it perhaps 
> the case that x1 and x2 have already been scaled to have standard deviation 
> 1? In that case, x1*x2 won't be.")
> 
> Best,
> Wolfgang
> 
> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick Brown
> Sent: Friday, May 05, 2017 10:40
> To: peter dalgaard
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS
> 
> Hi, 
> 
> Here is (I hope) all the relevant output from R. 
> 
>> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA, 
>> na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15 > 
>> lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA  
>> ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
>-0.3962254 -0.3636026 -0.1425772  ## 
> This is what I thought was the problem originally. :-) 
> 
> 
>> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) 
>> (Intercept)   ZMEAN_PA  ZDIVERSITY_PA 
>> ZMEAN_PA:ZDIVERSITY_PA 
>0.07342198-0.39650356-0.36569488   
>  -0.09435788 > coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, 
> data=s1)) ZMEAN_PA  ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
>0.07342198-0.39650356-0.36569488   
>  -0.09435788 The equivalent from SPSS is attached. The unstandardized 
> coefficients in SPSS look nothing like those in R. The standardized 
> coefficients in SPSS match the lm.ridge()$coef numbers very closely indeed, 
> suggesting that the same algorithm may be in use. 
> 
> I have put the dataset file, which is the untouched original I received from 
> the authors, in this Dropbox folder: 
> https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. 
> You can read it into R with this code (one variable needs to be standardized 
> and centered; everything else is already in the file): 
> 
> s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) 
> s1$ZDEPRESSION <- scale(s1$DEPRESSION) 
> Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure IBM 
> will want to fix it quickly (ha ha ha). 
> 
> Nick 
> 
> - Original Message -
> 
> From: "peter dalgaard"  
> To: "Nick Brown"  
> Cc: "Simon Bonner" , r-devel@r-project.org 
> Sent: Friday, 5 May, 2017 10:02:10 AM 
> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
> I asked you before, but in case you missed it: Are you looking at the right 
> place in SPSS output? 
> 
> The UNstandardized coefficients should be comparable to R, i.e. the "B" 
> column, not "Beta". 
> 
> -pd 
> 
>> On 5 May 2017, at 01:58 , Nick Brown  wrote: 
>> 
>> Hi Simon, 
>> 
>> Yes, if I uses c

Re: [Rd] complex tests failure

2017-05-05 Thread Tomas Kalibera
Thanks for the report, handled in configure in 72661 (R-devel).
I'll also port to R-patched.

Best
Tomas

On 05/04/2017 03:49 PM, Tomas Kalibera wrote:
>
> There is no way to control this at runtime.
> We will probably have to add a configure test.
>
> Best,
> Tomas
>
> On 05/04/2017 03:23 PM, Kasper Daniel Hansen wrote:
>> Thanks.
>>
>> I assume there is no way to control this via. environment variables 
>> or configure settings?  Obviously that would be great for something 
>> like this which affects tests and seems to be a known problem for 
>> older C standard libraries.
>>
>> Best,
>> Kasper
>>
>> On Thu, May 4, 2017 at 9:12 AM, Tomas Kalibera 
>> mailto:tomas.kalib...@gmail.com>> wrote:
>>
>>
>> As a quick fix, you can undefine HAVE_CTANH in complex.c,
>> somewhere after including config.h
>> An internal substitute, which is implemented inside complex.c,
>> will be used.
>>
>> Best
>> Tomas
>>
>>
>>
>>
>> On 05/04/2017 02:57 PM, Kasper Daniel Hansen wrote:
>>
>> For a while I have been getting that the complex tests fails
>> on RHEL 6.
>> The specific issue has to do with tanh (see below for full
>> output from
>> complex.Rout.fail).
>>
>> This is both with the stock compiler (GCC 4.4.7) and a
>> compiler supplied
>> through the conda project (GCC 4.8.5).  The compiler supplied
>> through conda
>> ends up linking R to certain system files, so the binary is
>> not completely
>> independent (although most dynamically linked libraries are
>> coming from the
>> conda installation).
>>
>> A search on R-devel reveals a discussion in April on an issue
>> reported on
>> Windows with a bug in tanh in old versions of the GNU C
>> standard library;
>> this seems relevant.  The discussion by Martin Maechler
>> suggest "using R's
>> internal substitute".  So how do I enable this?  Or does this
>> requires
>> updating the C standard library?
>>
>> ** From complex.Rout.fail
>>
>> stopifnot(identical(tanh(356+0i), 1+0i))
>>
>> Error: identical(tanh(356 + (0+0i)), 1 + (0+0i)) is not TRUE
>> In addition: Warning message:
>> In tanh(356 + (0+0i)) : NaNs produced in function "tanh"
>> Execution halted
>>
>> Best,
>> Kasper
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>>
>>
>>
>>
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] lm() gives different results to lm.ridge() and SPSS

2017-05-05 Thread Viechtbauer Wolfgang (SP)
I had no problems running regression models in SPSS and R that yielded the same 
results for these data.

The difference you are observing is from fitting different models. In R, you 
fitted:

res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat)
summary(res)

The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This is not 
a standardized variable itself and not the same as "ZINTER_PA_C" in the png you 
showed, which is not a variable in the dataset, but can be created with:

dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA))

If you want the same results as in SPSS, then you need to fit:

res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, data=dat)
summary(res)

This yields:

Coefficients:
  Estimate Std. Error t value Pr(>|t|)
(Intercept)6.410410.01722  372.21   <2e-16 ***
ZMEAN_PA  -1.627260.04200  -38.74   <2e-16 ***
ZDIVERSITY_PA -1.500820.07447  -20.15   <2e-16 ***
ZINTER_PA_C   -0.589550.05288  -11.15   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Exactly the same as in the png.

Peter already mentioned this as a possible reason for the discrepancy: 
https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it perhaps the 
case that x1 and x2 have already been scaled to have standard deviation 1? In 
that case, x1*x2 won't be.")

Best,
Wolfgang

-Original Message-
From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick Brown
Sent: Friday, May 05, 2017 10:40
To: peter dalgaard
Cc: r-devel@r-project.org
Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS

Hi, 

Here is (I hope) all the relevant output from R. 

> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA, 
> na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15 > 
> lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA   
>ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
-0.3962254 -0.3636026 -0.1425772  ## 
This is what I thought was the problem originally. :-) 


> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) (Intercept) 
>   ZMEAN_PA  ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
0.07342198-0.39650356-0.36569488
-0.09435788 > coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, 
data=s1)) ZMEAN_PA  ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
0.07342198-0.39650356-0.36569488
-0.09435788 The equivalent from SPSS is attached. The unstandardized 
coefficients in SPSS look nothing like those in R. The standardized 
coefficients in SPSS match the lm.ridge()$coef numbers very closely indeed, 
suggesting that the same algorithm may be in use. 

I have put the dataset file, which is the untouched original I received from 
the authors, in this Dropbox folder: 
https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. You 
can read it into R with this code (one variable needs to be standardized and 
centered; everything else is already in the file): 

s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) 
s1$ZDEPRESSION <- scale(s1$DEPRESSION) 
Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure IBM 
will want to fix it quickly (ha ha ha). 

Nick 

- Original Message -

From: "peter dalgaard"  
To: "Nick Brown"  
Cc: "Simon Bonner" , r-devel@r-project.org 
Sent: Friday, 5 May, 2017 10:02:10 AM 
Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS 

I asked you before, but in case you missed it: Are you looking at the right 
place in SPSS output? 

The UNstandardized coefficients should be comparable to R, i.e. the "B" column, 
not "Beta". 

-pd 

> On 5 May 2017, at 01:58 , Nick Brown  wrote: 
> 
> Hi Simon, 
> 
> Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). 
> So that's consistent, at least. 
> 
> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the 
> value from SPSS to 5dp, which is an interesting coincidence if these numbers 
> have no particular external meaning in lm.ridge(). 
> 
> Kind regards, 
> Nick 
> 
> - Original Message - 
> 
> From: "Simon Bonner"  
> To: "Nick Brown" , r-devel@r-project.org 
> Sent: Thursday, 4 May, 2017 7:07:33 PM 
> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
> Hi Nick, 
> 
> I think that the problem here is your use of $coef to extract the 
> coefficients of the ridge regression. The help for lm.ridge states that coef 
> is a "matrix of coefficients, one row for each value of lambda. Note that 
> these are not on the original scale and are for use by the coef method." 
> 
> I ran a small test with simulated data, code is copied below, and indeed the 
> output from lm.ridge differs depending on whether the coefficients are 
> accessed via $coef or via the 

[Rd] Possible bug in how POSIXct is printed

2017-05-05 Thread Deonarine, Ajay
Greetings R-devel group.  When dealing with Inf dates, as.POSIXct seems to 
return Inf, but is printed as NA:

> x1 <- as.POSIXct(Inf, origin = '1970-01-01')
> print(x1)
[1] NA
> is.na(x1)
[1] FALSE
> is.infinite(x1)
[1] TRUE
>

POSIXlt at least evaluates and prints the result consistently:

> x1 <- as.POSIXlt(Inf, origin = '1970-01-01')
> print(x1)
[1] NA
> is.na(x1)
[1] TRUE
>

I think the cause is due to format.POSIXct function calling format.POSIXlt, 
hence printing as NA, but actually storing an Inf.  I don't have an opinion on 
what the actual result of as.POSIXct(Inf) should be, as long as what's printed 
matches that value.

Thank you

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] lm() gives different results to lm.ridge() and SPSS

2017-05-05 Thread Nick Brown
Hi, 

Here is (I hope) all the relevant output from R. 

> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA, 
> na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15 > 
> lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA   
>ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
-0.3962254 -0.3636026 -0.1425772  ## 
This is what I thought was the problem originally. :-) 


> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) (Intercept) 
>   ZMEAN_PA  ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
0.07342198-0.39650356-0.36569488
-0.09435788 > coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, 
data=s1)) ZMEAN_PA  ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 
0.07342198-0.39650356-0.36569488
-0.09435788 The equivalent from SPSS is attached. The unstandardized 
coefficients in SPSS look nothing like those in R. The standardized 
coefficients in SPSS match the lm.ridge()$coef numbers very closely indeed, 
suggesting that the same algorithm may be in use. 

I have put the dataset file, which is the untouched original I received from 
the authors, in this Dropbox folder: 
https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. You 
can read it into R with this code (one variable needs to be standardized and 
centered; everything else is already in the file): 

s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) 
s1$ZDEPRESSION <- scale(s1$DEPRESSION) 
Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure IBM 
will want to fix it quickly (ha ha ha). 

Nick 



- Original Message -

From: "peter dalgaard"  
To: "Nick Brown"  
Cc: "Simon Bonner" , r-devel@r-project.org 
Sent: Friday, 5 May, 2017 10:02:10 AM 
Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS 

I asked you before, but in case you missed it: Are you looking at the right 
place in SPSS output? 

The UNstandardized coefficients should be comparable to R, i.e. the "B" column, 
not "Beta". 

-pd 

> On 5 May 2017, at 01:58 , Nick Brown  wrote: 
> 
> Hi Simon, 
> 
> Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). 
> So that's consistent, at least. 
> 
> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the 
> value from SPSS to 5dp, which is an interesting coincidence if these numbers 
> have no particular external meaning in lm.ridge(). 
> 
> Kind regards, 
> Nick 
> 
> - Original Message - 
> 
> From: "Simon Bonner"  
> To: "Nick Brown" , r-devel@r-project.org 
> Sent: Thursday, 4 May, 2017 7:07:33 PM 
> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
> Hi Nick, 
> 
> I think that the problem here is your use of $coef to extract the 
> coefficients of the ridge regression. The help for lm.ridge states that coef 
> is a "matrix of coefficients, one row for each value of lambda. Note that 
> these are not on the original scale and are for use by the coef method." 
> 
> I ran a small test with simulated data, code is copied below, and indeed the 
> output from lm.ridge differs depending on whether the coefficients are 
> accessed via $coef or via the coefficients() function. The latter does 
> produce results that match the output from lm. 
> 
> I hope that helps. 
> 
> Cheers, 
> 
> Simon 
> 
> ## Load packages 
> library(MASS) 
> 
> ## Set seed 
> set.seed() 
> 
> ## Set parameters 
> n <- 100 
> beta <- c(1,0,1) 
> sigma <- .5 
> rho <- .75 
> 
> ## Simulate correlated covariates 
> Sigma <- matrix(c(1,rho,rho,1),ncol=2) 
> X <- mvrnorm(n,c(0,0),Sigma=Sigma) 
> 
> ## Simulate data 
> mu <- beta[1] + X %*% beta[-1] 
> y <- rnorm(n,mu,sigma) 
> 
> ## Fit model with lm() 
> fit1 <- lm(y ~ X) 
> 
> ## Fit model with lm.ridge() 
> fit2 <- lm.ridge(y ~ X) 
> 
> ## Compare coefficients 
> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) 
> 
> [,1] [,2] [,3] 
> (Intercept) 0.99276001 NA 0.99276001 
> X1 -0.03980772 -0.04282391 -0.03980772 
> X2 1.11167179 1.06200476 1.11167179 
> 
> -- 
> 
> Simon Bonner 
> Assistant Professor of Environmetrics/ Director MMASc 
> Department of Statistical and Actuarial Sciences/Department of Biology 
> University of Western Ontario 
> 
> Office: Western Science Centre rm 276 
> 
> Email: sbonn...@uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 
> Twitter: @bonnerstatslab | Website: 
> http://simon.bonners.ca/bonner-lab/wpblog/ 
> 
>> -Original Message- 
>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick 
>> Brown 
>> Sent: May 4, 2017 10:29 AM 
>> To: r-devel@r-project.org 
>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS 
>> 
>> Hallo, 
>> 
>> I hope I am posting to the right place. I was advised to try this list by 
>> Ben Bolker 
>> (https://twitter.com/bolke

Re: [Rd] lm() gives different results to lm.ridge() and SPSS

2017-05-05 Thread peter dalgaard
I asked you before, but in case you missed it: Are you looking at the right 
place in SPSS output?

The UNstandardized coefficients should be comparable to R, i.e. the "B" column, 
not "Beta".

-pd

> On 5 May 2017, at 01:58 , Nick Brown  wrote:
> 
> Hi Simon, 
> 
> Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). 
> So that's consistent, at least. 
> 
> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the 
> value from SPSS to 5dp, which is an interesting coincidence if these numbers 
> have no particular external meaning in lm.ridge(). 
> 
> Kind regards, 
> Nick 
> 
> - Original Message -
> 
> From: "Simon Bonner"  
> To: "Nick Brown" , r-devel@r-project.org 
> Sent: Thursday, 4 May, 2017 7:07:33 PM 
> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
> Hi Nick, 
> 
> I think that the problem here is your use of $coef to extract the 
> coefficients of the ridge regression. The help for lm.ridge states that coef 
> is a "matrix of coefficients, one row for each value of lambda. Note that 
> these are not on the original scale and are for use by the coef method." 
> 
> I ran a small test with simulated data, code is copied below, and indeed the 
> output from lm.ridge differs depending on whether the coefficients are 
> accessed via $coef or via the coefficients() function. The latter does 
> produce results that match the output from lm. 
> 
> I hope that helps. 
> 
> Cheers, 
> 
> Simon 
> 
> ## Load packages 
> library(MASS) 
> 
> ## Set seed 
> set.seed() 
> 
> ## Set parameters 
> n <- 100 
> beta <- c(1,0,1) 
> sigma <- .5 
> rho <- .75 
> 
> ## Simulate correlated covariates 
> Sigma <- matrix(c(1,rho,rho,1),ncol=2) 
> X <- mvrnorm(n,c(0,0),Sigma=Sigma) 
> 
> ## Simulate data 
> mu <- beta[1] + X %*% beta[-1] 
> y <- rnorm(n,mu,sigma) 
> 
> ## Fit model with lm() 
> fit1 <- lm(y ~ X) 
> 
> ## Fit model with lm.ridge() 
> fit2 <- lm.ridge(y ~ X) 
> 
> ## Compare coefficients 
> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) 
> 
> [,1] [,2] [,3] 
> (Intercept) 0.99276001 NA 0.99276001 
> X1 -0.03980772 -0.04282391 -0.03980772 
> X2 1.11167179 1.06200476 1.11167179 
> 
> -- 
> 
> Simon Bonner 
> Assistant Professor of Environmetrics/ Director MMASc 
> Department of Statistical and Actuarial Sciences/Department of Biology 
> University of Western Ontario 
> 
> Office: Western Science Centre rm 276 
> 
> Email: sbonn...@uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 
> Twitter: @bonnerstatslab | Website: 
> http://simon.bonners.ca/bonner-lab/wpblog/ 
> 
>> -Original Message- 
>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick 
>> Brown 
>> Sent: May 4, 2017 10:29 AM 
>> To: r-devel@r-project.org 
>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS 
>> 
>> Hallo, 
>> 
>> I hope I am posting to the right place. I was advised to try this list by 
>> Ben Bolker 
>> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this 
>> question to StackOverflow 
>> (http://stackoverflow.com/questions/43771269/lm-gives-different-results- 
>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first 
>> program in 1975 and have been paid to program in about 15 different 
>> languages, so I have some general background knowledge. 
>> 
>> 
>> I have a regression from which I extract the coefficients like this: 
>> lm(y ~ x1 * x2, data=ds)$coef 
>> That gives: x1=0.40, x2=0.37, x1*x2=0.09 
>> 
>> 
>> 
>> When I do the same regression in SPSS, I get: 
>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. 
>> So the main effects are in agreement, but there is quite a difference in the 
>> coefficient for the interaction. 
>> 
>> 
>> X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my 
>> idea, but it got published), so there is quite possibly something going on 
>> with 
>> collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of 
>> where 
>> the problems are occurring. 
>> 
>> 
>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge 
>> penalty) and 
>> check we get the same results as with lm(): 
>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef 
>> x1=0.40, x2=0.37, x1*x2=0.14 
>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is 
>> the 
>> default, so it can be omitted; I can alternate between including or deleting 
>> ".ridge" in the function call, and watch the coefficient for the interaction 
>> change.) 
>> 
>> 
>> 
>> What seems slightly strange to me here is that I assumed that lm.ridge() 
>> just 
>> piggybacks on lm() anyway, so in the specific case where lambda=0 and there 
>> is no "ridging" to do, I'd expect exactly the same results. 
>> 
>> 
>> Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex 
>> will 
>> not be easy to make, but I can share the data via Dropbox or something if 
>> that 
>> would help. 
>