[R] Anova function and glm.nb

2008-04-07 Thread Nelson, Gary (FWE)
Hi All,

I am using the glm.nb function from the MASS package (current version)
to fit and compare GLMs with negative binomial error distributions.  My
question is: what is the appropriate method to use in the anova function
to compare models? If only one fitted object like

m-glm.nb(number-p+sal+temp,data=data)

is specified in the anova function (anova(m)), a fixed theta is used to
generate the analysis of deviance:

Analysis of Deviance Table
Model: Negative Binomial(0.2345), link: log
Response: number
Terms added sequentially (first to last)

  Df Deviance Resid. Df Resid. Dev P(|Chi|)
NULL117122.707  
p  1   11.327   116111.380 0.001
sal12.286   115109.094 0.131
tem11.979   114107.115 0.159
ph 12.567   113104.549 0.109
Warning message:
In anova.negbin(m) : tests made without re-estimating 'theta'


If multiple fitted objects like

m1-glm.nb(number~1,data=data)
m2-glm.nb(number~p,data=data)
m3-glm.nb(number~p+sal,data=data)
m4-glm.nb(number~p+sal+temp,data=data)


is specified (anova(m1,m2,m3,4)), the theta is assumed re-estimated in
each case to calculate the likelihood ratio tests:

Likelihood ratio tests of Negative Binomial Models
Response: number
   Model theta Resid. df2 x log-lik.   Testdf LR
stat. Pr(Chi)
1  1 0.1892056   117   -527.7463

2  p 0.2153105   116   -517.9349 1 vs 2 1
9.811382 0.001734351
3p + sal 0.2214626   115   -515.7942 2 vs 3 1
2.140706 0.143435894
4  p + sal + tem 0.2261900   114   -513.8846 3 vs 4 1
1.909643 0.167002884
5 p + sal + tem + ph 0.2344827   113   -511.3633 4 vs 5 1
2.521237 0.112322429


The conclusions are the same, but I'd like to know if one method is
favored over the other. 

Thanks,

Gary Nelson.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova function and glm.nb

2008-04-07 Thread Prof Brian Ripley
The re-estimated version is the principled one -- the other is a 
computational shortcut.

However, what you model might choose depends on why you are doing model 
selection, and also if there is a subject-specific reason to consider the 
terms in this order.

And if you look in MASS (the book) you will see a similar example and 
discussion of how to treat it.  It uses dropterm(), which is likely to be 
preferable to either of these approaches.

On Mon, 7 Apr 2008, Nelson, Gary (FWE) wrote:

 Hi All,

 I am using the glm.nb function from the MASS package (current version)
 to fit and compare GLMs with negative binomial error distributions.  My
 question is: what is the appropriate method to use in the anova function
 to compare models? If only one fitted object like

 m-glm.nb(number-p+sal+temp,data=data)

 is specified in the anova function (anova(m)), a fixed theta is used to
 generate the analysis of deviance:

 Analysis of Deviance Table
 Model: Negative Binomial(0.2345), link: log
 Response: number
 Terms added sequentially (first to last)

  Df Deviance Resid. Df Resid. Dev P(|Chi|)
 NULL117122.707
 p  1   11.327   116111.380 0.001
 sal12.286   115109.094 0.131
 tem11.979   114107.115 0.159
 ph 12.567   113104.549 0.109
 Warning message:
 In anova.negbin(m) : tests made without re-estimating 'theta'


 If multiple fitted objects like

 m1-glm.nb(number~1,data=data)
 m2-glm.nb(number~p,data=data)
 m3-glm.nb(number~p+sal,data=data)
 m4-glm.nb(number~p+sal+temp,data=data)


 is specified (anova(m1,m2,m3,4)), the theta is assumed re-estimated in
 each case to calculate the likelihood ratio tests:

 Likelihood ratio tests of Negative Binomial Models
 Response: number
   Model theta Resid. df2 x log-lik.   Testdf LR
 stat. Pr(Chi)
 1  1 0.1892056   117   -527.7463

 2  p 0.2153105   116   -517.9349 1 vs 2 1
 9.811382 0.001734351
 3p + sal 0.2214626   115   -515.7942 2 vs 3 1
 2.140706 0.143435894
 4  p + sal + tem 0.2261900   114   -513.8846 3 vs 4 1
 1.909643 0.167002884
 5 p + sal + tem + ph 0.2344827   113   -511.3633 4 vs 5 1
 2.521237 0.112322429


 The conclusions are the same, but I'd like to know if one method is
 favored over the other.

 Thanks,

 Gary Nelson.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.