Hi Mark, as last comment you may also take a look at ?summary.lm where you will notice, that R reports two different R squares depending on the presence or absence of an intercept term. For comparison issues you should ensure that you use the same mathematical object. There was a thread about this (from where I took essentially Prof. Ripley reply for this answer) in Jan 2006, as you see in http://tolstoy.newcastle.edu.au/R/help/06/01/18923.html hth.
Leeds, Mark (IED) schrieb: > Eik : Today, I've been reading Myers text , "classical and modern regression > with applications" to refresh my memory > about regression because it's been a while since I looked at that material. > The usbtraction of the means from > Both sides of the equation causing the intercept to be zero now makes more > sense because, in the simple regression > case, > > b0 = y bar - b1 x bar and, by subtracting the means, y bar and x bar both > become zero, so b0 = zero. > > If you have any other comments, they are very appreciated and always invited > but I think between what you showed and above, > it's clearer now. I think I will go with centering both the left and right > hand sides to force the zero intercepts, estimate > each model with the intercept ( which will hopefully numerically estimate the > intercept as very close to zero ) and then compare > the RSquareds of the two models. If you still see this as a problem, let me > know because I am totally open to listening to other > people's brains , especially good ones like yours. > > > > -----Original Message----- > From: Eik Vettorazzi [mailto:[EMAIL PROTECTED] > Sent: Tuesday, August 28, 2007 8:33 AM > To: Leeds, Mark (IED) > Cc: R-help > Subject: Re: FW: [R] How to fit an linear model withou intercept > > Hi Mark, > I don't know wether you recived a sufficient reply or not, so here are my > comments to your problem. > Supressing the constant term in a regression model will probably lead to a > violation of the classical assumptions for this model. > From the OLS normal equations (in matrix notation) > (1) (X'X)b=X'y > and the definition of the OLS residuals > (2) e = y-Xb > you get - by substituting y form (2) in (1) > (X'X)b=(X'X)b+X'e > and hence > X'e =0. > Without a constant term you cannot assure, that the ols residuals > e=(y-Xb) will have zero mean, wich holds when involving a constant term, > since the first equation of X'e = 0 gives in this case sum(e)=0. > > For decomposing the TSS (y'y) into ESS (b'X'Xb) and RSS (e'e), which is > needed to compute R², you will need X'e=0, because then the cross-product > term b'X'e vanishes. > Correct me if I'm wrong. > > Leeds, Mark (IED) schrieb: > >> Park, Eik : Could you start from the bottom and read this when you >> have time. I really appreciate it. >> >> Basically, in a nutshell, my question is the "Hi John" part and I want >> to do my study correctly. Thanks a lot. >> >> >> >> -----Original Message----- >> From: Leeds, Mark (IED) >> Sent: Thursday, August 23, 2007 1:05 PM >> To: 'John Sorkin' >> Cc: '[EMAIL PROTECTED]' >> Subject: RE: [R] How to fit an linear model withou intercept >> >> Hi John : I'm from the R-list obviously and that was a nice example >> that I cut and pasted and learned from. I'm Sorry to bother you but I >> had a non R question that I didn't want to pose to the R-list because >> I think It's been discussed a lot in the past but I never focused on >> the discussion. >> >> I need to do a study where I decide between two different univariate >> regressions models. The LHS is the same in both cases and it's not the >> goal of the study to build a prediction model but rather to see which >> RHS ( univariate ) explains the LHS better. >> It's actually in a time series framework also but that's not relevant >> for my question. My question has 2 parts : >> >> 1) I was leaning towards using the R squared as the decision criteria >> ( I will be Regressing monthly and over a couple of years so I will >> have about 24 rsquareds. I have tons of data For one monthly >> regression so I don't have to just do one big regression over the >> whole time period ) but I noticed in your previous example that the >> model with intercept ( compared to the model forced to have zero >> intercept ) had a lower R^2 and a lower standard error at the same >> time !!!!! So this asymmetry leads me to think that maybe I should be >> using standard error rather than Rsquared as my criteria ? >> >> 2) This is possibly related to 1 : Isn't there a problem with using >> the Rsquared for anything when you force no intercept ? >> I think I remember seeing discussions about this on the list. That's >> why I was thinking of including the intercept. >> ( intercept in my problem really has no meaning but I wanted to retain >> the validity of the Rsquared ) But, now that I see your email, maybe I >> should be still including an intercept and using standard error as the >> criteria. >> Or maybe when you include an intercept ( in both cases ) you don't get >> this asymmetry between Rsquared and standrd error. >> I was surprised to see the asymmetry but maybe it happens because one >> is comparing model with intercept to a model without intercept and no >> intercept probably renders the rsquared critieria meaningless in the >> latter. >> >> Thanks for any insight you can provide. I can also center and go >> without intercept because it sounded like you DEFINITELY preferred >> that Method over just not including an intercept at all. I was >> thinking of sending this question to the R-list but I didn't want to >> get hammered because I know that this is not a new discussion. Thanks so >> much. >> >> >> >> Mark >> >> P.S : How the heck did you get an MD and a Ph.D ? Unbelievable. Did >> you do them at the same time ? >> >> >> >> >> -----Original Message----- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin >> Sent: Thursday, August 23, 2007 9:29 AM >> To: David Barron; Michal Kneifl; r-help >> Subject: Re: [R] How to fit an linear model withou intercept >> >> Michael, >> Assuming you want a model with an intercept of zero, I think we need >> to ask you why you want an intercept of zero. When a "normal" >> regression indicates a non-zero intercet, forcing the regression line >> to have a zero intercept changes the meaning of the regression >> coefficients. If for some reason you want to have a zero intercept, >> but do not want to change the meaning of the regression coefficeints, >> i.e. you still what to minimize the sum of the square deviations from >> the BLUE (Best Leastsquares Unibiased Estimator) of the regression, >> you can center your dependent and indepdent variables re-run the >> regression. Centering means subtracting the mean of each variable from >> the variable before performing the regression. When you do this, the >> intercept term will be zero (or more likely a very, very, very small >> number that is not statisitclally different from zero - it will not be >> exactly zero due to limits on the precision of computer calculations) >> and the slope term will be the sam! >> e as that you obtained from the "normal" BLUE regression. What you >> are actually doing is transforming your data so it is centered around >> x=0, y=0, i.e. the mean of the x and y terms will be zero. I am not >> sure this is what you want to do, but I am pasting below some R code >> that will allow you to see the affect fourcing the intercept to be >> zero has on the slope, and how centering the data yields a zero >> intercept without changing the slope. >> John >> >> >> >> oldpar<-par(ask=T) >> >> # Set up x and y values. Note as defined the slope of the # regression >> should be close to one (save for the "noise" >> added to the y values) and the intercept should be close to four. >> x<-0:10 >> y<-x+4+rnorm(11,0,1) >> plot(x,y) >> title("Original data") >> >> # Fit a "normal" regression line to the data and display # the >> regression line on the scatter plot >> fitNormalReg<-lm(y~x) >> abline(fitNormalReg) >> >> # Fit a regression line in which the intercept has been # forced to be >> zero and display the line on the scattter # plot. >> fitZeroInt<-lm(y~-1+x) >> abline(fitZeroInt,lty=2) >> >> # Compare fits. >> summary(fitNormalReg) >> summary(fitZeroInt) >> # There is a statistically significant difference # between the models >> - the model with and intercetpt, # the "normal" regression is the >> better fit. >> anova(fit1,fit2) >> >> # Center y and x by subtracting their means. >> yCentered<-y-mean(y) >> xCentered<-x-mean(x) >> # Regress the centered y values on the centered x values. This # will >> give us a model with an intercept that is very, very # small. It would >> be zero save for the precision limits # inherent in using a computer. >> Plot the line. Notice the # slope of the centered is the same as that >> obtained from # the normal regression. >> fitCentered<-lm(yCentered~xCentered) >> abline(fitCentered,lty=10) >> >> # Compare the three regressions. Note the slope from the # "normal" >> regression and centered regressions are the same. >> # The intercept from the centered regression is very, very small # and >> would be zero save for the limits of computer mathematics. >> summary(fitNormalReg) >> summary(fitZeroInt) >> summary(fitCentered) >> >> # Plot the centered data and show that the line goes through zero. >> plot(xCentered,yCentered) >> abline(fitCentered) >> title("Centered data") >> oldpar<-par(ask=T) >> >> >> # Set up x and y values. Note as defined the slope of the # regression >> should be close to one (save for the "noise" >> added to the y values) and the intercept should be close to four. >> x<-0:10 >> y<-x+4+rnorm(11,0,1) >> plot(x,y) >> title("Original data") >> >> # Fit a "normal" regression line to the data and display # the >> regression line on the scatter plot >> fitNormalReg<-lm(y~x) >> abline(fitNormalReg) >> >> # Fit a regression line in which the intercept has been # forced to be >> zero and display the line on the scattter # plot. >> fitZeroInt<-lm(y~-1+x) >> abline(fitZeroInt,lty=2) >> >> # Compare fits. >> summary(fitNormalReg) >> summary(fitZeroInt) >> # There is a statistically significant difference # between the models >> - the model with and intercetpt, # the "normal" regression is the >> better fit. >> anova(fit1,fit2) >> >> # Center y and x by subtracting their means. >> yCentered<-y-mean(y) >> xCentered<-x-mean(x) >> # Regress the centered y values on the centered x values. This # will >> give us a model with an intercept that is very, very # small. It would >> be zero save for the precision limits # inherent in using a computer. >> Plot the line. Notice the # slope of the centered is the same as that >> obtained from # the normal regression. >> fitCentered<-lm(yCentered~xCentered) >> abline(fitCentered,lty=10) >> >> # Compare the three regressions. Note the slope from the # "normal" >> regression and centered regressions are the same. >> # The intercept from the centered regression is very, very small # and >> would be zero save for the limits of computer mathematics. >> summary(fitNormalReg) >> summary(fitZeroInt) >> summary(fitCentered) >> >> # Plot the centered data and show that the line goes through zero. >> plot(xCentered,yCentered) >> abline(fitCentered) >> title("Centered data") >> par<-par(oldpar) >> >> >> >> >> >> >> >> >> John Sorkin M.D., Ph.D. >> Chief, Biostatistics and Informatics >> Baltimore VA Medical Center GRECC, >> University of Maryland School of Medicine Claude D. Pepper OAIC, >> University of Maryland Clinical Nutrition Research Unit, and Baltimore >> VA Center Stroke of Excellence >> >> University of Maryland School of Medicine Division of Gerontology >> Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) >> Baltimore, MD 21201-1524 >> >> (Phone) 410-605-7119 >> (Fax) 410-605-7913 (Please call phone number above prior to faxing) >> [EMAIL PROTECTED] >> >> >> >>>>> "David Barron" <[EMAIL PROTECTED]> 08/23/07 5:38 AM >>> >>>>> >>>>> >> A number of alternatives, such as: >> >> lm(y ~ 0 + x) >> lm(y ~ x -1) >> >> See ?formula >> >> On 8/23/07, Michal Kneifl <[EMAIL PROTECTED]> wrote: >> >> >>> Please could anyone help me? >>> How can I fit a linear model where an intercept has no sense? >>> Thanks in advance.. >>> >>> Michael >>> >>> ______________________________________________ >>> R-help@stat.math.ethz.ch mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >> -- >> ================================= >> David Barron >> Said Business School >> University of Oxford >> Park End Street >> Oxford OX1 1HP >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> Confidentiality Statement: >> This email message, including any attachments, is for the >> so...{{dropped}} >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> -------------------------------------------------------- >> >> -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/42803-8243 F ++49/40/42803-7790 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.