Dear All (apologies if some of you have received this twice)
Thanks very much for the rapid reply Prof Ripley. I had been looking at this 
anaysis for my colleague (Prof Behnke) and suggested that he contact the R 
mailing list because I couldn't answer his question. I think some of the detail 
got lost in translation (he grew up with the GLIM package!). So here are some 
more details:
We are indeed following your guidelines in the MASS book, and using glm.nb to 
analyse some data on the abundance of several parasite species in mice. We 
proceeded with model selection as suggested, and we are reasonably happy that 
we end up with decent models for our several parasite species. 
The question that Prof Behnke asked is: if we fit similar models (initial full 
models have the same factors and covariates) with different response variables 
(abundances of different species of parasite), is there a way of comparing the 
relative effect sizes of the key explanatory variables across different models? 
For example, if we find that the best models for two different species include 
the term "sex", is there a way of determining if sex explains more of the 
variance in parasite abundance in species A than in species B? 
In a simple ANOVA with Guassian errors, we might compare the percentage 
variance explained. We could also look at the overall r^2 for the models and 
determine how well (relatively) our different models perform. We might end up 
concluding that for species A, we have found the most important biolgoical 
factors explaining parasite abundance, but that for species B we have yet to 
explain a large proportion of the variance.
Is there something similar we can do with our glm.nb models? Clearly the 
coefficients will tell us about relative effect sizes WITHIN a given model, but 
what can we do when comparing completely different response variables?!
Regards 
Tom Reader 



-----Original Message-----
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
Sent: Wed 17/01/2007 14:01
To: Behnke Jerzy
Cc: r-help@stat.math.ethz.ch; Reader Tom
Subject: Re: [R] Effect size in GLIM models
 
On Wed, 17 Jan 2007, Behnke Jerzy wrote:

> Dear All,
> I wonder if anyone can advise me as to whether there is a consensus as
> to how the effect size should be calculated from GLIM models in R for
> any specified significant main effect or interaction.

I think there is consensus that effect sizes are not measured by 
significance tests.  If you have a log link (you did not say), the model 
coefficients have a direct interpretation via multiplicative increases in 
rates.

> In investigating the causes of variation in infection in wild animals,
> we have fitted 4-way GLIM models in R with negative binomial errors.

What exactly do you mean by 'GLIM models in R with negative binomial 
errors'?  Negative binomial regression is within the GLM framework only 
for fixed shape theta. Package MASS has glm.nb() which extends the 
framework and you may be using without telling us.  (AFAIK GLIM is a 
software package, not a class of models.)

I suspect you are using the code from MASS without reference to the book
it supports, which has a worked example of model selection.

> These are then simplified using the STEP procedure, and finally each of
> the remaining terms is deleted in turn, and the model without that term
> compared to a model with that term to estimate probability

'probability' of what?

> An ANOVA of each model gives the deviance explained by each interaction
> and main effect, and the percentage deviance attributable to each factor
> can be calculated from NULL deviance.

If theta is not held fixed, anova() is probably not appropriate: see the 
help for anova.negbin.

> However, we estimate probabilities by subsequent deletion of terms, and
> this gives the LR statistic. Expressing the value of the LR statistic as
> a percentage of 2xlog-like in a model without any factors, gives lower
> values than the former procedure.

I don't know anything to suggest percentages of LR statistics are 
reasonable summary measures.  There are extensions of R^2 to these models, 
but AFAIK they share the well-attested drawbacks of R^2.

> Are either of these appropriate? If so which is best, or alternatively
> how can % deviance be calculated. We require % deviance explained by
> each factor or interaction,  because we need to compare individual
> factors (say host age) across a range of infections.
>
> Any advice will be most gratefully appreciated. I can send you a worked
> example if you require more information.

We do ask for more information in the posting guide and the footer of 
every message.  I have had to guess uncomfortably much in formulating my 
answers.

> Jerzy. M. Behnke,
> The School of Biology,
> The University of Nottingham,
> University Park,
> NOTTINGHAM, NG7 2RD
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.


        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to