Dear David, Thanks for your answer. Yes now that you mentioned these points are in the beginning of a variable range. From the plot of the residuals seems to have non constant variance which is solved by a transformation. I checked also for interactions by using the symbol : between two variables and the change on the result was not so important. I am working on computer science field but I wanted to do an analysis from scratch because some previous results that I have seen are not good for such cases. Moreover the data are not the same of course.
Thanks, George On 06/21/2011 01:08 PM, David Winsemius wrote: > > On Jun 21, 2011, at 3:49 AM, George Markomanolis wrote: > >> Dear all, >> >> I am new to this field and I have a question about a linear regression. >> I have a dataset of around to 31000 points and I want to apply a linear >> regression. The R-squared is 0.9 however when I check the diagnostic >> plots I can see that there are around to 250 points with big leverage >> value. As I know the points with big leverage influence a lot the fit. >> If I remove these points in order to check their influence, the >> R-squared of the rest points is 0.71. So I removed less than 1% of my >> data and the fit is not so good. Could you please give me any advice >> about this? Is it right to let these 250 points in my dataset or not? >> Could I do something else? The data are measured through an experiment >> so even these 250 points are real values. > > You could be looking at the descriptive statistics on the points. > Perhaps they are at one end of a variable range, or you perhaps have > some other feature that is scientifically interesting. So far you have > only been examining one set of simple linear hypotheses and have not > (presumably) been looking at any non-linear possibilities or the > potential that interactions are affecting the outcome. The prior > science of your (so far undescribed) domain should be carefully > considered, but in your message we see no evidence of such. > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

