Hi John, Thanks for the comment... but that appears to mean that SPSS has a big problem. I have always been told that to include an interaction term in a regression, the only way is to do the multiplication by hand. But then it seems to be impossible to stop SPSS from re-standardizing the variable that corresponds to the interaction term. Am I missing something? Is there a way to perform the regression with the interaction in SPSS without computing the interaction as a separate variable?
Best, Nick ----- Original Message ----- From: "John Fox" <j...@mcmaster.ca> To: "Nick Brown" <nick.br...@free.fr>, "peter dalgaard" <pda...@gmail.com> Cc: r-devel@r-project.org Sent: Friday, 5 May, 2017 8:22:53 PM Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS Dear Nick, On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown" <r-devel-boun...@r-project.org on behalf of nick.br...@free.fr> wrote: >>I conjecture that something in the vicinity of >> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + >>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >>summary(res) >> would reproduce the SPSS Beta values. > >Yes, that works. Thanks! That you have to work hard in R to match the SPSS results isn’t such a bad thing when you factor in the observation that standardizing the interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense. Best, John ------------------------------------- John Fox, Professor McMaster University Hamilton, Ontario, Canada Web: http://socserv.mcmaster.ca/jfox/ > > >----- Original Message ----- > >From: "peter dalgaard" <pda...@gmail.com> >To: "Viechtbauer Wolfgang (SP)" ><wolfgang.viechtba...@maastrichtuniversity.nl>, "Nick Brown" ><nick.br...@free.fr> >Cc: r-devel@r-project.org >Sent: Friday, 5 May, 2017 3:33:29 PM >Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS > >Thanks, I was getting to try this, but got side tracked by actual work... > >Your analysis reproduces the SPSS unscaled estimates. It still remains to >figure out how Nick got > >> >coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) > >(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >0.07342198 -0.39650356 -0.36569488 -0.09435788 > > >which does not match your output. I suspect that ZMEAN_PA and >ZDIVERSITY_PA were scaled for this analysis (but the interaction term >still obviously is not). I conjecture that something in the vicinity of > >res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + >scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >summary(res) > >would reproduce the SPSS Beta values. > > >> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP) >><wolfgang.viechtba...@maastrichtuniversity.nl> wrote: >> >> I had no problems running regression models in SPSS and R that yielded >>the same results for these data. >> >> The difference you are observing is from fitting different models. In >>R, you fitted: >> >> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat) >> summary(res) >> >> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This >>is not a standardized variable itself and not the same as "ZINTER_PA_C" >>in the png you showed, which is not a variable in the dataset, but can >>be created with: >> >> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA)) >> >> If you want the same results as in SPSS, then you need to fit: >> >> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, >>data=dat) >> summary(res) >> >> This yields: >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 6.41041 0.01722 372.21 <2e-16 *** >> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 *** >> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 *** >> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 *** >> --- >> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >> >> Exactly the same as in the png. >> >> Peter already mentioned this as a possible reason for the discrepancy: >>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it >>perhaps the case that x1 and x2 have already been scaled to have >>standard deviation 1? In that case, x1*x2 won't be.") >> >> Best, >> Wolfgang >> >> -----Original Message----- >> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Nick >>Brown >> Sent: Friday, May 05, 2017 10:40 >> To: peter dalgaard >> Cc: r-devel@r-project.org >> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> Hi, >> >> Here is (I hope) all the relevant output from R. >> >>> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > >>>mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, >>>na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA * >>>ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA >>>ZMEAN_PA:ZDIVERSITY_PA >> -0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the >>problem originally. :-) >> >> >>> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) >>>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >> 0.07342198 -0.39650356 -0.36569488 -0.09435788 > >>coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) >>ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >> 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS >>is attached. The unstandardized coefficients in SPSS look nothing like >>those in R. The standardized coefficients in SPSS match the >>lm.ridge()$coef numbers very closely indeed, suggesting that the same >>algorithm may be in use. >> >> I have put the dataset file, which is the untouched original I received >>from the authors, in this Dropbox folder: >>https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0 >>. You can read it into R with this code (one variable needs to be >>standardized and centered; everything else is already in the file): >> >> s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) >>s1$ZDEPRESSION <- scale(s1$DEPRESSION) >> Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm >>sure IBM will want to fix it quickly (ha ha ha). >> >> Nick >> >> ----- Original Message ----- >> >> From: "peter dalgaard" <pda...@gmail.com> >> To: "Nick Brown" <nick.br...@free.fr> >> Cc: "Simon Bonner" <sbonn...@uwo.ca>, r-devel@r-project.org >> Sent: Friday, 5 May, 2017 10:02:10 AM >> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> I asked you before, but in case you missed it: Are you looking at the >>right place in SPSS output? >> >> The UNstandardized coefficients should be comparable to R, i.e. the "B" >>column, not "Beta". >> >> -pd >> >>> On 5 May 2017, at 01:58 , Nick Brown <nick.br...@free.fr> wrote: >>> >>> Hi Simon, >>> >>> Yes, if I uses coefficients() I get the same results for lm() and >>>lm.ridge(). So that's consistent, at least. >>> >>> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees >>>with the value from SPSS to 5dp, which is an interesting coincidence if >>>these numbers have no particular external meaning in lm.ridge(). >>> >>> Kind regards, >>> Nick >>> >>> ----- Original Message ----- >>> >>> From: "Simon Bonner" <sbonn...@uwo.ca> >>> To: "Nick Brown" <nick.br...@free.fr>, r-devel@r-project.org >>> Sent: Thursday, 4 May, 2017 7:07:33 PM >>> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS >>> >>> Hi Nick, >>> >>> I think that the problem here is your use of $coef to extract the >>>coefficients of the ridge regression. The help for lm.ridge states that >>>coef is a "matrix of coefficients, one row for each value of lambda. >>>Note that these are not on the original scale and are for use by the >>>coef method." >>> >>> I ran a small test with simulated data, code is copied below, and >>>indeed the output from lm.ridge differs depending on whether the >>>coefficients are accessed via $coef or via the coefficients() function. >>>The latter does produce results that match the output from lm. >>> >>> I hope that helps. >>> >>> Cheers, >>> >>> Simon >>> >>> ## Load packages >>> library(MASS) >>> >>> ## Set seed >>> set.seed(8888) >>> >>> ## Set parameters >>> n <- 100 >>> beta <- c(1,0,1) >>> sigma <- .5 >>> rho <- .75 >>> >>> ## Simulate correlated covariates >>> Sigma <- matrix(c(1,rho,rho,1),ncol=2) >>> X <- mvrnorm(n,c(0,0),Sigma=Sigma) >>> >>> ## Simulate data >>> mu <- beta[1] + X %*% beta[-1] >>> y <- rnorm(n,mu,sigma) >>> >>> ## Fit model with lm() >>> fit1 <- lm(y ~ X) >>> >>> ## Fit model with lm.ridge() >>> fit2 <- lm.ridge(y ~ X) >>> >>> ## Compare coefficients >>> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) >>> >>> [,1] [,2] [,3] >>> (Intercept) 0.99276001 NA 0.99276001 >>> X1 -0.03980772 -0.04282391 -0.03980772 >>> X2 1.11167179 1.06200476 1.11167179 >>> >>> -- >>> >>> Simon Bonner >>> Assistant Professor of Environmetrics/ Director MMASc >>> Department of Statistical and Actuarial Sciences/Department of Biology >>> University of Western Ontario >>> >>> Office: Western Science Centre rm 276 >>> >>> Email: sbonn...@uwo.ca | Telephone: 519-661-2111 x88205 | Fax: >>>519-661-3813 >>> Twitter: @bonnerstatslab | Website: >>>http://simon.bonners.ca/bonner-lab/wpblog/ >>> >>>> -----Original Message----- >>>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of >>>>Nick >>>> Brown >>>> Sent: May 4, 2017 10:29 AM >>>> To: r-devel@r-project.org >>>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS >>>> >>>> Hallo, >>>> >>>> I hope I am posting to the right place. I was advised to try this >>>>list by Ben Bolker >>>> (https://twitter.com/bolkerb/status/859909918446497795). I also >>>>posted this >>>> question to StackOverflow >>>> >>>>(http://stackoverflow.com/questions/43771269/lm-gives-different-results >>>>- >>>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my >>>>first >>>> program in 1975 and have been paid to program in about 15 different >>>> languages, so I have some general background knowledge. >>>> >>>> I have a regression from which I extract the coefficients like this: >>>> lm(y ~ x1 * x2, data=ds)$coef >>>> That gives: x1=0.40, x2=0.37, x1*x2=0.09 >>>> >>>> When I do the same regression in SPSS, I get: >>>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. >>>> So the main effects are in agreement, but there is quite a difference >>>>in the >>>> coefficient for the interaction. >>>> >>>> X1 and X2 are correlated about .75 (yes, yes, I know - this model >>>>wasn't my >>>> idea, but it got published), so there is quite possibly something >>>>going on with >>>> collinearity. So I thought I'd try lm.ridge() to see if I can get an >>>>idea of where >>>> the problems are occurring. >>>> >>>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge >>>>penalty) and >>>> check we get the same results as with lm(): >>>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef >>>> x1=0.40, x2=0.37, x1*x2=0.14 >>>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, >>>>lambda=0 is the >>>> default, so it can be omitted; I can alternate between including or >>>>deleting >>>> ".ridge" in the function call, and watch the coefficient for the >>>>interaction >>>> change.) >>>> >>>> What seems slightly strange to me here is that I assumed that >>>>lm.ridge() just >>>> piggybacks on lm() anyway, so in the specific case where lambda=0 and >>>>there >>>> is no "ridging" to do, I'd expect exactly the same results. >>>> >>>> Unfortunately there are 34,000 cases in the dataset, so a "minimal" >>>>reprex will >>>> not be easy to make, but I can share the data via Dropbox or >>>>something if that >>>> would help. >>>> >>>> I appreciate that when there is strong collinearity then all bets are >>>>off in terms >>>> of what the betas mean, but I would really expect lm() and lm.ridge() >>>>to give >>>> the same results. (I would be happy to ignore SPSS, but for the >>>>moment it's >>>> part of the majority!) >>>> >>>> Thanks for reading, >>>> Nick >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > >-- >Peter Dalgaard, Professor, >Center for Statistics, Copenhagen Business School >Solbjerg Plads 3, 2000 Frederiksberg, Denmark >Phone: (+45)38153501 >Office: A 4.23 >Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-devel@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel