On Mon, 2007-02-19 at 09:58 -0500, Pierre Lapointe wrote: > Hello, > > I have a particular situation where a single "wrong" observation is > impacting the results of a traditional regression to the point that > betas become unreliable. I need a way to calculate the most likely > betas. Here's an example: > > set.seed(1) > unknownbeta <- matrix(seq(100,500,100),25,5,byrow=TRUE) > x <-matrix(runif(25*5),25) > y <- rowSums(unknownbeta*x) > summary(lm(y~0+x)) #gets back the unknown betas. > > #Now, let's introduce a single wrong data. > > unknownbeta[25,5] <-100 > y <- rowSums(unknownbeta*x) > summary(lm(y~0+x)) #every beta changes. > > I need to find out what are the most likely betas in the second > example. There is no obvious way to know that row 25 has wrong input. > I would even be happy if the conclusion was that x1:x4 are 100, 200, > 300 and 400 and that x5 is zero. > > Thanks
It is not clear what you mean by a "wrong" observation. Is the data completely bad because it was improperly collected? Is this an observation that has correct data, but is an "outlier" relative to the other observations? Is the observation missing data, where values can be reasonably imputed? Are you in a setting where the observation MUST be included in the regression rather than be deleted? For example an "Intent to Treat" analysis in a clinical trial? Depending upon the context, your options may range from simply removing the single observation from the regression, considering some form of weighting of the observations, to perhaps considering a robust regression methodology and others. This is not strictly an R question, but one of methodology. Clarification of which is potentially impacted upon by "community" standards and prior work within your particular discipline. HTH, Marc Schwartz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.