>I conjecture that something in the vicinity of > res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + > scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >summary(res) > would reproduce the SPSS Beta values.
Yes, that works. Thanks! ----- Original Message ----- From: "peter dalgaard" <pda...@gmail.com> To: "Viechtbauer Wolfgang (SP)" <wolfgang.viechtba...@maastrichtuniversity.nl>, "Nick Brown" <nick.br...@free.fr> Cc: r-devel@r-project.org Sent: Friday, 5 May, 2017 3:33:29 PM Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS Thanks, I was getting to try this, but got side tracked by actual work... Your analysis reproduces the SPSS unscaled estimates. It still remains to figure out how Nick got > coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) (Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 0.07342198 -0.39650356 -0.36569488 -0.09435788 which does not match your output. I suspect that ZMEAN_PA and ZDIVERSITY_PA were scaled for this analysis (but the interaction term still obviously is not). I conjecture that something in the vicinity of res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) summary(res) would reproduce the SPSS Beta values. > On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP) > <wolfgang.viechtba...@maastrichtuniversity.nl> wrote: > > I had no problems running regression models in SPSS and R that yielded the > same results for these data. > > The difference you are observing is from fitting different models. In R, you > fitted: > > res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat) > summary(res) > > The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This is > not a standardized variable itself and not the same as "ZINTER_PA_C" in the > png you showed, which is not a variable in the dataset, but can be created > with: > > dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA)) > > If you want the same results as in SPSS, then you need to fit: > > res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, data=dat) > summary(res) > > This yields: > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 6.41041 0.01722 372.21 <2e-16 *** > ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 *** > ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 *** > ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > Exactly the same as in the png. > > Peter already mentioned this as a possible reason for the discrepancy: > https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it perhaps > the case that x1 and x2 have already been scaled to have standard deviation > 1? In that case, x1*x2 won't be.") > > Best, > Wolfgang > > -----Original Message----- > From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick Brown > Sent: Friday, May 05, 2017 10:40 > To: peter dalgaard > Cc: r-devel@r-project.org > Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS > > Hi, > > Here is (I hope) all the relevant output from R. > >> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA, >> na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15 > >> lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA >> ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA > -0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the problem > originally. :-) > > >> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) >> (Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA > 0.07342198 -0.39650356 -0.36569488 -0.09435788 > > coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) > ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA > 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS is > attached. The unstandardized coefficients in SPSS look nothing like those in > R. The standardized coefficients in SPSS match the lm.ridge()$coef numbers > very closely indeed, suggesting that the same algorithm may be in use. > > I have put the dataset file, which is the untouched original I received from > the authors, in this Dropbox folder: > https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. > You can read it into R with this code (one variable needs to be standardized > and centered; everything else is already in the file): > > s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) > s1$ZDEPRESSION <- scale(s1$DEPRESSION) > Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure IBM > will want to fix it quickly (ha ha ha). > > Nick > > ----- Original Message ----- > > From: "peter dalgaard" <pda...@gmail.com> > To: "Nick Brown" <nick.br...@free.fr> > Cc: "Simon Bonner" <sbonn...@uwo.ca>, r-devel@r-project.org > Sent: Friday, 5 May, 2017 10:02:10 AM > Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS > > I asked you before, but in case you missed it: Are you looking at the right > place in SPSS output? > > The UNstandardized coefficients should be comparable to R, i.e. the "B" > column, not "Beta". > > -pd > >> On 5 May 2017, at 01:58 , Nick Brown <nick.br...@free.fr> wrote: >> >> Hi Simon, >> >> Yes, if I uses coefficients() I get the same results for lm() and >> lm.ridge(). So that's consistent, at least. >> >> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the >> value from SPSS to 5dp, which is an interesting coincidence if these numbers >> have no particular external meaning in lm.ridge(). >> >> Kind regards, >> Nick >> >> ----- Original Message ----- >> >> From: "Simon Bonner" <sbonn...@uwo.ca> >> To: "Nick Brown" <nick.br...@free.fr>, r-devel@r-project.org >> Sent: Thursday, 4 May, 2017 7:07:33 PM >> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> Hi Nick, >> >> I think that the problem here is your use of $coef to extract the >> coefficients of the ridge regression. The help for lm.ridge states that coef >> is a "matrix of coefficients, one row for each value of lambda. Note that >> these are not on the original scale and are for use by the coef method." >> >> I ran a small test with simulated data, code is copied below, and indeed the >> output from lm.ridge differs depending on whether the coefficients are >> accessed via $coef or via the coefficients() function. The latter does >> produce results that match the output from lm. >> >> I hope that helps. >> >> Cheers, >> >> Simon >> >> ## Load packages >> library(MASS) >> >> ## Set seed >> set.seed(8888) >> >> ## Set parameters >> n <- 100 >> beta <- c(1,0,1) >> sigma <- .5 >> rho <- .75 >> >> ## Simulate correlated covariates >> Sigma <- matrix(c(1,rho,rho,1),ncol=2) >> X <- mvrnorm(n,c(0,0),Sigma=Sigma) >> >> ## Simulate data >> mu <- beta[1] + X %*% beta[-1] >> y <- rnorm(n,mu,sigma) >> >> ## Fit model with lm() >> fit1 <- lm(y ~ X) >> >> ## Fit model with lm.ridge() >> fit2 <- lm.ridge(y ~ X) >> >> ## Compare coefficients >> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) >> >> [,1] [,2] [,3] >> (Intercept) 0.99276001 NA 0.99276001 >> X1 -0.03980772 -0.04282391 -0.03980772 >> X2 1.11167179 1.06200476 1.11167179 >> >> -- >> >> Simon Bonner >> Assistant Professor of Environmetrics/ Director MMASc >> Department of Statistical and Actuarial Sciences/Department of Biology >> University of Western Ontario >> >> Office: Western Science Centre rm 276 >> >> Email: sbonn...@uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 >> Twitter: @bonnerstatslab | Website: >> http://simon.bonners.ca/bonner-lab/wpblog/ >> >>> -----Original Message----- >>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick >>> Brown >>> Sent: May 4, 2017 10:29 AM >>> To: r-devel@r-project.org >>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS >>> >>> Hallo, >>> >>> I hope I am posting to the right place. I was advised to try this list by >>> Ben Bolker >>> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this >>> question to StackOverflow >>> (http://stackoverflow.com/questions/43771269/lm-gives-different-results- >>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first >>> program in 1975 and have been paid to program in about 15 different >>> languages, so I have some general background knowledge. >>> >>> I have a regression from which I extract the coefficients like this: >>> lm(y ~ x1 * x2, data=ds)$coef >>> That gives: x1=0.40, x2=0.37, x1*x2=0.09 >>> >>> When I do the same regression in SPSS, I get: >>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. >>> So the main effects are in agreement, but there is quite a difference in >>> the >>> coefficient for the interaction. >>> >>> X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my >>> idea, but it got published), so there is quite possibly something going on >>> with >>> collinearity. So I thought I'd try lm.ridge() to see if I can get an idea >>> of where >>> the problems are occurring. >>> >>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge >>> penalty) and >>> check we get the same results as with lm(): >>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef >>> x1=0.40, x2=0.37, x1*x2=0.14 >>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is >>> the >>> default, so it can be omitted; I can alternate between including or >>> deleting >>> ".ridge" in the function call, and watch the coefficient for the >>> interaction >>> change.) >>> >>> What seems slightly strange to me here is that I assumed that lm.ridge() >>> just >>> piggybacks on lm() anyway, so in the specific case where lambda=0 and there >>> is no "ridging" to do, I'd expect exactly the same results. >>> >>> Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex >>> will >>> not be easy to make, but I can share the data via Dropbox or something if >>> that >>> would help. >>> >>> I appreciate that when there is strong collinearity then all bets are off >>> in terms >>> of what the betas mean, but I would really expect lm() and lm.ridge() to >>> give >>> the same results. (I would be happy to ignore SPSS, but for the moment it's >>> part of the majority!) >>> >>> Thanks for reading, >>> Nick > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel