For both glm() and mgcv::gam() quasibinomial error models, the summary object has a dispersion value that has the same role as sigma^2 in the summary object for lm() model fits.
Where some fitted probabilities are small, the `gam()` default scale parameter estimates, returned as `scale` (and `sig2`) in the gam object and as “dispersion" in the summary object, can differ wildly from the Pearson values that `glm()` works with, and that can be obtained by calling `gam()` with a suitable control setting (see the code that now follows.) The following demonstrates the issue: ## ‘mgcv’ version 1.8-34 Cholera <- HistData:: Cholera library(mgcv) form <- cbind(cholera_deaths, popn-cholera_deaths) ~ water + elevation + poor_rate default.gam <- gam(form, data=Cholera, family=quasibinomial) pearson.gam <- update(quasibin.gam, control=gam.control(scale.est=“pearson")) c(Pearson=pearson.gam$scale, Default=default.gam$scale) ## Pearson Default ## 33.545829 2.919535 My own calculation (from either fitted model), returns 30.07 for the (Fletcher 2012) version of the dispersion that was, according to Wood’s “Generalized Additive Models” (2nd edn, 2017, p.111), returned as the GAM scale estimate at the time when the book was written. The default scale estimates returned by `gam()` vary wildly, relative to the relatively stable “pearson" estimates, in data that are simulated to have comparable dispersion estimates. For the Cholera data, it would make good sense to fit a model with quasipoisson errors to the death counts, using log(popn) as an offset. The GAM model then uses, with default setting for `scale.est`, a scale parameter that is close to that returned by "pearson.gam”. SEs (as well as coefficients) are similar to those returned by "pearson.gam”. The detailed calculations that are documented in the following documents may be of some general interest. https://www.dropbox.com/s/vl9usat07urbgel/quasibin-gam.pdf?dl=0 or https://www.dropbox.com/s/s83mh1mut5xc3gk/quasibin-gam.html?dl=0 I am posting this here now before posting a bug report in case I have missed something important. It would be useful to be directed to the mgcv code used for calculation of what is returned as the Fletcher statistic. Rmd file: https://www.dropbox.com/s/ghmsdcvgxp068bs/quasibin-gam.Rmd?dl=0 .bib file: https://www.dropbox.com/s/r1yjqx0sni2pzjy/quasi.bib?dl=0 John Maindonald email: john.maindon...@anu.edu.au ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.