Hi Mark, I don't know wether you recived a sufficient reply or not, so here are my comments to your problem. Supressing the constant term in a regression model will probably lead to a violation of the classical assumptions for this model. From the OLS normal equations (in matrix notation) (1) (X'X)b=X'y and the definition of the OLS residuals (2) e = y-Xb you get - by substituting y form (2) in (1) (X'X)b=(X'X)b+X'e and hence X'e =0. Without a constant term you cannot assure, that the ols residuals e=(y-Xb) will have zero mean, wich holds when involving a constant term, since the first equation of X'e = 0 gives in this case sum(e)=0.
For decomposing the TSS (y'y) into ESS (b'X'Xb) and RSS (e'e), which is needed to compute R², you will need X'e=0, because then the cross-product term b'X'e vanishes. Correct me if I'm wrong. Leeds, Mark (IED) schrieb: > Park, Eik : Could you start from the bottom and read this when you have > time. I really appreciate it. > > Basically, in a nutshell, my question is the "Hi John" part and I want > to do my study correctly. Thanks a lot. > > > > -----Original Message----- > From: Leeds, Mark (IED) > Sent: Thursday, August 23, 2007 1:05 PM > To: 'John Sorkin' > Cc: '[EMAIL PROTECTED]' > Subject: RE: [R] How to fit an linear model withou intercept > > Hi John : I'm from the R-list obviously and that was a nice example > that I cut and pasted and learned from. I'm Sorry to bother you but I > had a non R question that I didn't want to pose to the R-list because I > think It's been discussed a lot in the past but I never focused on the > discussion. > > I need to do a study where I decide between two different univariate > regressions models. The LHS is the same in both cases and it's not the > goal of the study to build a prediction model but rather to see which > RHS ( univariate ) explains the LHS better. > It's actually in a time series framework also but that's not relevant > for my question. My question has 2 parts : > > 1) I was leaning towards using the R squared as the decision criteria ( > I will be Regressing monthly and over a couple of years so I will have > about 24 rsquareds. I have tons of data For one monthly regression so I > don't have to just do one big regression over the whole time period ) > but I noticed in your previous example that the model with intercept ( > compared to the model forced to have zero intercept ) had a lower R^2 > and a lower standard error at the same time !!!!! So this asymmetry > leads me to think that maybe I should be using standard error rather > than Rsquared as my criteria ? > > 2) This is possibly related to 1 : Isn't there a problem with using the > Rsquared for anything when you force no intercept ? > I think I remember seeing discussions about this on the list. That's why > I was thinking of including the intercept. > ( intercept in my problem really has no meaning but I wanted to retain > the validity of the Rsquared ) But, now that I see your email, maybe I > should be still including an intercept and using standard error as the > criteria. > Or maybe when you include an intercept ( in both cases ) you don't get > this asymmetry between Rsquared and standrd error. > I was surprised to see the asymmetry but maybe it happens because one > is comparing model with intercept to a model without intercept and no > intercept probably renders the rsquared critieria meaningless in the > latter. > > Thanks for any insight you can provide. I can also center and go without > intercept because it sounded like you DEFINITELY preferred that Method > over just not including an intercept at all. I was thinking of sending > this question to the R-list but I didn't want to get hammered because I > know that this is not a new discussion. Thanks so much. > > > > Mark > > P.S : How the heck did you get an MD and a Ph.D ? Unbelievable. Did you > do them at the same time ? > > > > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin > Sent: Thursday, August 23, 2007 9:29 AM > To: David Barron; Michal Kneifl; r-help > Subject: Re: [R] How to fit an linear model withou intercept > > Michael, > Assuming you want a model with an intercept of zero, I think we need to > ask you why you want an intercept of zero. When a "normal" regression > indicates a non-zero intercet, forcing the regression line to have a > zero intercept changes the meaning of the regression coefficients. If > for some reason you want to have a zero intercept, but do not want to > change the meaning of the regression coefficeints, i.e. you still what > to minimize the sum of the square deviations from the BLUE (Best > Leastsquares Unibiased Estimator) of the regression, you can center your > dependent and indepdent variables re-run the regression. Centering means > subtracting the mean of each variable from the variable before > performing the regression. When you do this, the intercept term will be > zero (or more likely a very, very, very small number that is not > statisitclally different from zero - it will not be exactly zero due to > limits on the precision of computer calculations) and the slope term > will be the sam! > e as that you obtained from the "normal" BLUE regression. What you are > actually doing is transforming your data so it is centered around x=0, > y=0, i.e. the mean of the x and y terms will be zero. I am not sure this > is what you want to do, but I am pasting below some R code that will > allow you to see the affect fourcing the intercept to be zero has on the > slope, and how centering the data yields a zero intercept without > changing the slope. > John > > > > oldpar<-par(ask=T) > > # Set up x and y values. Note as defined the slope of the # regression > should be close to one (save for the "noise" > added to the y values) and the intercept should be close to four. > x<-0:10 > y<-x+4+rnorm(11,0,1) > plot(x,y) > title("Original data") > > # Fit a "normal" regression line to the data and display # the > regression line on the scatter plot > fitNormalReg<-lm(y~x) > abline(fitNormalReg) > > # Fit a regression line in which the intercept has been # forced to be > zero and display the line on the scattter # plot. > fitZeroInt<-lm(y~-1+x) > abline(fitZeroInt,lty=2) > > # Compare fits. > summary(fitNormalReg) > summary(fitZeroInt) > # There is a statistically significant difference # between the models - > the model with and intercetpt, # the "normal" regression is the better > fit. > anova(fit1,fit2) > > # Center y and x by subtracting their means. > yCentered<-y-mean(y) > xCentered<-x-mean(x) > # Regress the centered y values on the centered x values. This # will > give us a model with an intercept that is very, very # small. It would > be zero save for the precision limits # inherent in using a computer. > Plot the line. Notice the # slope of the centered is the same as that > obtained from # the normal regression. > fitCentered<-lm(yCentered~xCentered) > abline(fitCentered,lty=10) > > # Compare the three regressions. Note the slope from the # "normal" > regression and centered regressions are the same. > # The intercept from the centered regression is very, very small # and > would be zero save for the limits of computer mathematics. > summary(fitNormalReg) > summary(fitZeroInt) > summary(fitCentered) > > # Plot the centered data and show that the line goes through zero. > plot(xCentered,yCentered) > abline(fitCentered) > title("Centered data") > oldpar<-par(ask=T) > > > # Set up x and y values. Note as defined the slope of the # regression > should be close to one (save for the "noise" > added to the y values) and the intercept should be close to four. > x<-0:10 > y<-x+4+rnorm(11,0,1) > plot(x,y) > title("Original data") > > # Fit a "normal" regression line to the data and display # the > regression line on the scatter plot > fitNormalReg<-lm(y~x) > abline(fitNormalReg) > > # Fit a regression line in which the intercept has been # forced to be > zero and display the line on the scattter # plot. > fitZeroInt<-lm(y~-1+x) > abline(fitZeroInt,lty=2) > > # Compare fits. > summary(fitNormalReg) > summary(fitZeroInt) > # There is a statistically significant difference # between the models - > the model with and intercetpt, # the "normal" regression is the better > fit. > anova(fit1,fit2) > > # Center y and x by subtracting their means. > yCentered<-y-mean(y) > xCentered<-x-mean(x) > # Regress the centered y values on the centered x values. This # will > give us a model with an intercept that is very, very # small. It would > be zero save for the precision limits # inherent in using a computer. > Plot the line. Notice the # slope of the centered is the same as that > obtained from # the normal regression. > fitCentered<-lm(yCentered~xCentered) > abline(fitCentered,lty=10) > > # Compare the three regressions. Note the slope from the # "normal" > regression and centered regressions are the same. > # The intercept from the centered regression is very, very small # and > would be zero save for the limits of computer mathematics. > summary(fitNormalReg) > summary(fitZeroInt) > summary(fitCentered) > > # Plot the centered data and show that the line goes through zero. > plot(xCentered,yCentered) > abline(fitCentered) > title("Centered data") > par<-par(oldpar) > > > > > > > > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC, > University of Maryland School of Medicine Claude D. Pepper OAIC, > University of Maryland Clinical Nutrition Research Unit, and Baltimore > VA Center Stroke of Excellence > > University of Maryland School of Medicine Division of Gerontology > Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > [EMAIL PROTECTED] > > >>>> "David Barron" <[EMAIL PROTECTED]> 08/23/07 5:38 AM >>> >>>> > A number of alternatives, such as: > > lm(y ~ 0 + x) > lm(y ~ x -1) > > See ?formula > > On 8/23/07, Michal Kneifl <[EMAIL PROTECTED]> wrote: > >> Please could anyone help me? >> How can I fit a linear model where an intercept has no sense? >> Thanks in advance.. >> >> Michael >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > > -- > ================================= > David Barron > Said Business School > University of Oxford > Park End Street > Oxford OX1 1HP > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > Confidentiality Statement: > This email message, including any attachments, is for the\...{{dropped}} ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.