On Feb 27, 2013, at 21:55 , Milan Bouchet-Valat wrote: > Thanks for the (critical, indeed) answer! > > Le mercredi 27 février 2013 à 20:48 +0100, peter dalgaard a écrit : >> On Feb 27, 2013, at 19:46 , Milan Bouchet-Valat wrote: >> >>> I cannot believes nobody cares about this -- or I'm completely wrong and >>> in that case everybody should rush to put the shame on me... :-p >> >> Well, nobs() is the number of observations. If you have 5 Poisson >> distributed counts, you have 5 observations. > Well, say that to the statistical offices that spend millions to survey > thousands of people with correct (but complex) sampling designs, they'll > be happy to know that the collected data only provides an information > equivalent to 5 independent outcomes. ;-)
My objection is mainly technical/conceptual: Suppose 5 Poisson counts, say of the number of defaults in 5 counties, are not 5 observations. Then how many observations are 5 negative binomial counts, say of white blood cell counts in 5 patients? A generic function called nobs() should mork similarly across a range of fitted models and it would be inconsistent if it suddenly did something different in a single distribution. > >> If the number of observations is not the right thing to use in some >> context, use the right thing instead. Changing the definition of >> nobs() surely leads to madness. > It is common usage in the literature using log-linear models to report > the sum of counts as the number of observations. I think this indeed > makes sense, but I'm not particularly attached to the choice of words -- > let's call it as you please. It makes OK sense in isolation, I suppose. Especially if you interpret the table as multinomial counts rather than Poisson ones. If you interpret the total count as a Poisson variable, all cell counts become independent Poisson variables. However, the issue here is about coherent and consistent software design, and that goes beyond dealing with contingency tables. > > The root issue is that nobs() was precisely introduced to be the basis > for the BIC() function, as ?nobs states explicitly: >> Extract the number of ‘observations’ from a model fit. This is >> principally intended to be used in computing BIC (see ‘AIC’) > I think it is unfortunate to specify a function in terms of what it is used for. It should be specified in terms of what it does. > So it's OK to say that the number of observations is the number of cells > (even if I think this is not very user-friendly), but then the > documentation is misleading, and the BIC() function returns incorrect > values for the very first example provided in ?glm. > >> (I suppose that the fact that n is so obviously the wrong thing for >> one particularly well-digested family of distribution functions could >> be taken to indicate a generic weakness with the BIC.) > I'm sure we can agree on the fact that BIC has its weaknesses (and I'm > not the best person able to judge), but the point at stake is IMHO not > one of them. After all, usual statistics for the Poisson family, such as > deviance or residuals, are based on the sum of counts, not on the number > of cells, and nobody objects. At least for the deviance, that's just untrue. The deviance is zero for a saturated table. If some cells are split, the deviance becomes nonzero. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel