Re: [R] Under-dispersion - a stats question?

Martin Henry H. Stevens Wed, 12 Oct 2005 03:42:55 -0700

Hello all:
Thank you for you interest.

This text of this email  is in the attached "R-help.r" file.
The R script is in "R-helpscript.r".
The data set is "wk6trial.csv".

One of my students has performed a laboratory experiment with petridishes containing hundreds of species of bacteria, and six specieseach of algae and ciliated protozoans. Our goal was to examine theeffects of nutrient concentration and dish size on the number ofspecies of each group remaining after six weeks.


I attached the data set and some code for the algae analysis.

We had four dish sizes (factor), seven nutrient concentrations(continuous), and three replicates of each unique treatmentcombination, for a total n = 84.

Our response variables were (i) the number of bacterial species(0-400 species, modeled with quasipoisson), (ii) the proportion ofalgae species (out of six initial species - modeled with binomial)and (iii) the proportion of protozoan species (out of six initialspecies - modeled with binomial). For algae and protozoans, wemodeled the proportion of species rather than the raw number becausein each case we were constrained by the design to have between 0 and6 species. I discussed this with a local statistician, and he thoughtit made sense.

Each of these response variables is the combined result of bothunknown species' responses to treatments as well as the unknowninteractions among species. Further, these three responses arethemselves interdependent to some degree. For instance, the numberand identity of protozoan species may influence the number ofbacterial species. Nonetheless, it is common practice in ecology tomodel the number of species of a group (or its logarithm) with aunivariate model assuming either a normal or Poisson errordistribution. I would HAPPILY learn better.

While modeling these groups, I consulted a few texts (Neter et al.1996, Venables and Ripley 2002, Dalgaard 2002, Crawley 2002, Fox2002) and attempted to follow standard procedures laid out in thesebooks.


For the algae and the protozoans, I began with a binomial model,

glm(cbind(AS, 6-AS) ~ Nutrients + I(Nutrients^2) + Size +

Nutrients:Size + I(Nutrients^2):Size, data=dat,family=binomial)

where AS is the number of algae species in a dish. I retained thisfamily upon observation that the residual dev. / residual DF was (foralgae) = 0.19. I minimized the model by hand based on the F tests(not the treatment contrast coefficients, after V&R p. 197 - Hauckand Donner 1977) and using step() and found that the only significanttreatment was a linear effect of nutrient concentration. I examinedthe qq plot, the resid ~ fitted plot, and Cook's distances andeverything looked fine.

When I repeated this with quasibinomial, it estimated the dispersionparameter (0.19), I found that both Size and Nutrients weresignificant (no interaction).

So,... my orignal question to the list was, is it appropriate tomodel and fit the error distribution with quasi- functions ifdispersion seems much less than 1.0?

Now I am unclear how to evaluate under-dispersion (even afterconsulting V&R 2002, p. 208-209).

Upon reading through this, if you made it this far, you may have lotsof other comments as well, and I truly hope to become better educatedas a result!

BTW, I modeled the bacteria with a quasipoisson (dispersion = 91!).Perhaps a negative binomial would have been better?


Many thanks for your inputs,
Hank Stevens




On Oct 12, 2005, at 1:10 AM, Jari Oksanen wrote:

On Tue, 2005-10-11 at 17:16 -0400, Kjetil Holuerson wrote:

Martin Henry H. Stevens wrote:

Hello all:
I frequently have glm models in which the residual variance is much

lower than the residual degrees of freedom (e.g. Res.Dev=30.5,Res.DF

= 82). Is it appropriate for me to use a quasipoisson error
distribution and test it with an F distribution? It seems to me that
I could stand to gain a much-reduced standard error if I let the
procedure estimate my dispersion factor (which is what I assume the
quasi- distributions do).


I did'nt see an answer to this. maybe you could treat as a
quasimodel, but first you should ask why there is underdispersion.

Underdispersion could arise if you have dependent responses, for
instance, competition (say, between plants) could produce

underdispersion. Then you would be better off changing to anappropriate

model. maybe you could post more about your experimental setup?

Some ecologists from Bergen, Norway, suggest using quasipoissonwith its

underdispersed residual error (while I wouldn't do that). However, it

indeed would be useful to know a bit more about the setup, like thetype

of dependent variable. If the dependent variable happens to be the
number of species (like it's been in some papers by MHHS), this
certainly is *not* Poisson nor quasi-Poisson nor in the exponential
family, although it so often is modelled. I've often seen that species
richness (number of species -- or in R-speak 'tokens' -- in a
collection) is underdispersed to Poisson, and for a good reason. Even
there I'd play safe and use poisson() instead of underdispersed
quasipoisson().

cheers, jari oksanen
--
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email [EMAIL PROTECTED], homepage http://cc.oulu.fi/~jarioksa/


Dr. Martin Henry H. Stevens, Assistant Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056

Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/~stevenmh/
http://www.muohio.edu/ecology/
http://www.muohio.edu/botany/
"E Pluribus Unum"

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Under-dispersion - a stats question?

Reply via email to