[R] choosing between Poisson regression models: no interactions vs. interactions

James Milks Tue, 31 Jul 2007 11:27:52 -0700

R gurus,

I'm working on data analysis for a small project.  My response  
variable is total vines per tree (median = 0, mean = 1.65, min = 0,  
max = 24).  My predictors are two categorical variables (four sites  
and four species) and one continuous (tree diameter at breast height  
(DBH)).  The main question I'm attempting to answer is whether or not  
the species identity of a tree has any effects on the number of vines  
clinging to the trunk.  Given that the response variable is count  
data, I decided to use Poisson regression, even though I'm not as  
familiar with it as linear or logit regression.


My problem is deciding which model to use.  I have created several,  
one without interaction terms (Total.vines~Site+Species+DBH), one  
with an interaction term between Site and Species  
(Total.vines~Site*Species+DBH), and one with interactions between all  
variables (Total.vines~Site*Species*DBH).  Here is my output from R  
for the first two models (the last model has the same number (and  
identity) of significant variables as the second model, even though  
the last model had more interaction terms overall):

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Call:
glm(formula = Total.vines ~ Site + Species + DBH, family = poisson)

Deviance Residuals:
     Min       1Q   Median       3Q      Max
-5.2067  -1.2915  -0.7095  -0.3525   6.3756

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)
(Intercept)     -2.987695   0.231428 -12.910  < 2e-16 ***
SiteHuffman Dam  2.725193   0.249423  10.926  < 2e-16 ***
SiteNarrows      1.902987   0.227599   8.361  < 2e-16 ***
SiteSugar Creek  1.752754   0.242186   7.237 4.58e-13 ***
SpeciesFRAM      0.955468   0.157423   6.069 1.28e-09 ***
SpeciesPLOC      1.187903   0.141707   8.383  < 2e-16 ***
SpeciesULAM      0.340792   0.184615   1.846   0.0649 .
DBH              0.020708   0.001292  16.026  < 2e-16 ***
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

(Dispersion parameter for poisson family taken to be 1)

     Null deviance: 1972.3  on 544  degrees of freedom
Residual deviance: 1290.0  on 537  degrees of freedom
AIC: 1796.0

Number of Fisher Scoring iterations: 6

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Call:
glm(formula = Total.vines ~ Site * Species + DBH, family = poisson,
     data = sycamores.1)

Deviance Residuals:
     Min       1Q   Median       3Q      Max
-4.9815  -1.2370  -0.6339  -0.3403   6.5664

Coefficients: (3 not defined because of singularities)
                               Estimate Std. Error z value Pr(>|z|)
(Intercept)                  -2.788243   0.303064  -9.200  < 2e-16 ***
SiteHuffman Dam               1.838952   0.354127   5.193 2.07e-07 ***
SiteNarrows                   2.252716   0.323184   6.970 3.16e-12 ***
SiteSugar Creek             -12.961519 519.152077  -0.025 0.980082
SpeciesFRAM                  13.938716 519.152230   0.027 0.978580
SpeciesPLOC                   0.240223   0.540676   0.444 0.656824
SpeciesULAM                   1.919586   0.540246   3.553 0.000381 ***
DBH                           0.019984   0.001337  14.946  < 2e-16 ***
SiteHuffman Dam:SpeciesFRAM -11.513823 519.152294  -0.022 0.982306
SiteNarrows:SpeciesFRAM     -13.593127 519.152268  -0.026 0.979111
SiteSugar Creek:SpeciesFRAM         NA         NA      NA       NA
SiteHuffman Dam:SpeciesPLOC         NA         NA      NA       NA
SiteNarrows:SpeciesPLOC       0.397503   0.555218   0.716 0.474028
SiteSugar Creek:SpeciesPLOC  15.640450 519.152277   0.030 0.975966
SiteHuffman Dam:SpeciesULAM  -0.102841   0.610027  -0.169 0.866124
SiteNarrows:SpeciesULAM      -2.809092   0.606804  -4.629 3.67e-06 ***
SiteSugar Creek:SpeciesULAM         NA         NA      NA       NA
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

(Dispersion parameter for poisson family taken to be 1)

     Null deviance: 1972.3  on 544  degrees of freedom
Residual deviance: 1178.7  on 531  degrees of freedom
AIC: 1696.6

Number of Fisher Scoring iterations: 13
%%%%%%%%%%%%%%%%%%%%

As you can see, the two models give very different output, especially  
in regards to whether or not the individual species are significant.   
In the no-interaction model, the only species that was not  
significant was ULAM.  In the one-way interaction model, ULAM was the  
only significant species.  My question is this: which model should I  
use when I present this analysis?  I know that the one-way  
interaction model has the lower AIC.  Should I base my choice solely  
on AIC?  The reasons I'm asking is that the second model has only one  
significant interaction term, fewer significant terms overall, and  
three undefined terms.

Thanks for any guidance you can give to someone running his first  
Poisson regression.

Jim Milks

Graduate Student
Environmental Sciences Ph.D. Program
136 Biological Sciences
Wright State University
3640 Colonel Glenn Hwy
Dayton, OH 45435



        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] choosing between Poisson regression models: no interactions vs. interactions

Reply via email to