Peter Dalgaard wrote: > William Dunlap wrote: >> In modelling functions some people like to use >> a weight of 0 to drop an observation instead of >> using a subset value of FALSE. E.g., >> weights=c(0,1,1,...) >> instead of >> subset=c(FALSE, TRUE, TRUE, ...) >> to drop the first observation. >> >> lm() and summary.lm() appear to treat these in the >> same way, decrementing the number of degrees of >> freedom for each dropped observation. However, >> predict.lm() does not treat them the same. It >> doesn't seem to diminish the df to account for the >> 0-weighted observations. E.g., the last printout >> from the following script is as follows, where >> predw is the prediction from the fit that used >> 0-weights and preds is from using FALSE's in the >> subset argument. Is this difference proper? > > Nice catch. > > The issue is that the subset fit and the zero-weighted fit are not > completely the same. Notice that the residuals vector has different > length in the two analyses. With a simplified setup: > >> length(lm(y~1,weights=w)$residuals) > [1] 10 >> length(lm(y~1,subset=-1)$residuals) > [1] 9 >> w > [1] 0 1 1 1 1 1 1 1 1 1 > > This in turn is what confuses predict.lm because it gets n and residual > df from length(object$residuals). summary.lm() uses NROW(Qr$qr), and I > suppose that predict.lm should follow suit. >
...and then when I went to fix it, I found that the actual line in the sources (stats/R/lm.R) reads 27442 ripley n <- length(object$residuals) # NROW(object$qr$qr) so it's been like that since December 2003. I wonder if Brian remembers what the point was? (27442 was the restructuring into the stats package, so it might not actually be Brian's code). -pd -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel