Re: [R] Binomial GLM, chisq.test, or?

Tal Galili Fri, 04 May 2012 11:15:59 -0700

Without going deeply into your analysis, 2 comments:

1) Use the anova command to test two nested models using:
anova(model1, model2, test="Chisq")


2) glm's are non-trivial models (at least to me), be sure to google for
some tutorials in order to understand what you are looking at...

Cheers,
Tal





----------------Contact
Details:-------------------------------------------------------
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Fri, May 4, 2012 at 6:38 PM, lincoln <misen...@hotmail.com> wrote:

> Hi,
>
> I have a data set with 999 observations, for each of them I have data on
> four variables:
> site, colony, gender (quite a few NA values), and cohort.
>
> This is how the data set looks like:
> > str(dispersal)
> 'data.frame':   999 obs. of  4 variables:
>  $ site  : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 2 2 ...
>  $ gender: Factor w/ 2 levels "0","1": NA NA 2 1 2 NA 1 2 2 NA ...
>  $ colony: Factor w/ 2 levels "main","other": 2 2 2 2 2 2 2 2 2 2 ...
>  $ cohort: Factor w/ 11 levels "1996","2000",..: 8 8 8 8 8 8 8 8 8 6 ...
>
> Now, I want to estimate if sites 1 and site 2 differ on some of the other
> variables. For instance there are relatively more males in site 1 with
> respect to site 2, more individuals of the main colony in site 2 with
> respect to site 1 and this sort of things.
>
> I thought I might do a binomial GLM considering as response variable the
> site, I tried to run the more general model to have a look to
> overdispersion
> but I believe there is something wrong even before worrying about
> overdispersion. I know (I did a chisq.test) that cohort2004 is very
> diversly
> represented between the two sites but it is not reflected in the results of
> GLM. Here there are the results of chisq.test and the GLM:
>
> 1)
> />
>
> age_cohort<-as.table(rbind(c(142,95,46,33,14,59,18,12,7,1,0),c(258,144,54,70,20,11,6,8,2,3,1)))
> > dimnames(age_cohort)<-list(site=c("M","D"),
> +
> cohorts=c(2010,2009,2008,2007,2006,2004,2003,2002,2001,2000,1996))
> > age_cohort
>    cohorts
> site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996
>   M  142   95   46   33   14   59   18   12    7    1    0
>   D  258  144   54   70   20   11    6    8    2    3    1
> > (Xsqagec <- chisq.test(age_cohort))  # Prints test summary
>
>        Pearson's Chi-squared test
>
> data:  age_cohort
> X-squared = 82.6016, df = 10, p-value = 1.549e-13
>
> Mensajes de aviso perdidos
> In chisq.test(age_cohort) : Chi-squared approximation may be incorrect
> > Xsqagec$observed   # observed counts
>    cohorts
> site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996
>   M  142   95   46   33   14   59   18   12    7    1    0
>   D  258  144   54   70   20   11    6    8    2    3    1
> > Xsqagec$expected   # expected counts under the null
>    cohorts
> site     2010     2009     2008     2007     2006     2004     2003
>   M 170.1195 101.6464 42.52988 43.80578 14.46016 29.77092 10.20717
>   D 229.8805 137.3536 57.47012 59.19422 19.53984 40.22908 13.79283
>    cohorts
> site      2002     2001     2000      1996
>   M  8.505976 3.827689 1.701195 0.4252988
>   D 11.494024 5.172311 2.298805 0.5747012
> > Xsqagec$residuals  # Pearson residuals
>    cohorts
> site       2010       2009       2008       2007       2006
>   M -2.1559111 -0.6592367  0.5321050 -1.6326395 -0.1210101
>   D  1.8546283  0.5671101 -0.4577448  1.4044825  0.1040993
>    cohorts
> site       2004       2003       2002       2001       2000
>   M  5.3569686  2.4391720  1.1980192  1.6214643 -0.5376032
>   D -4.6083465 -2.0983042 -1.0305993 -1.3948690  0.4624746
>    cohorts
> site       1996
>   M -0.6521494
>   D  0.5610132
> > Xsqagec$stdres     # standardized residuals
>    cohorts
> site       2010       2009       2008       2007       2006
>   M -3.6665549 -0.9962228  0.7397057 -2.2733888 -0.1623984
>   D  3.6665549  0.9962228 -0.7397057  2.2733888  0.1623984
>    cohorts
> site       2004       2003       2002       2001       2000
>   M  7.3264142  3.2566808  1.5962909  2.1485311 -0.7105713
>   D -7.3264142 -3.2566808 -1.5962909 -2.1485311  0.7105713
>    cohorts
> site       1996
>   M -0.8606814
>   D  0.8606814
> /
>
>
> 2)
> /> model1<-glm(site~gender*colony*cohort,binomial)
> > summary(model1)
>
> Call:
> glm(formula = site ~ gender * colony * cohort, family = binomial)
>
> Deviance Residuals:
>     Min        1Q    Median        3Q       Max
> -1.84648  -0.96954  -0.00036   1.11269   2.03933
>
> Coefficients: (12 not defined because of singularities)
>                                 Estimate Std. Error z value
> (Intercept)                    -1.657e+01  2.400e+03  -0.007
> gender1                        -2.231e-01  9.220e-01  -0.242
> colonyother                     9.531e-02  8.006e-01   0.119
> cohort2002                      1.717e-08  3.393e+03   0.000
> cohort2003                      1.766e+01  2.400e+03   0.007
> cohort2004                      1.807e+01  2.400e+03   0.008
> cohort2006                      1.697e+01  2.400e+03   0.007
> cohort2007                      1.726e+01  2.400e+03   0.007
> cohort2008                      1.606e+01  2.400e+03   0.007
> cohort2009                      1.657e+01  2.400e+03   0.007
> cohort2010                      1.587e+01  2.400e+03   0.007
> gender1:colonyother             9.163e-01  1.087e+00   0.843
> gender1:cohort2002              1.719e+01  2.400e+03   0.007
> gender1:cohort2003             -1.823e-01  1.713e+00  -0.106
> gender1:cohort2004              2.231e-01  1.329e+00   0.168
> gender1:cohort2006             -5.878e-01  1.586e+00  -0.371
> gender1:cohort2007             -6.454e-02  1.784e+00  -0.036
> gender1:cohort2008              8.881e-01  1.156e+00   0.768
> gender1:cohort2009             -2.817e-02  1.199e+00  -0.023
> gender1:cohort2010                     NA         NA      NA
> colonyother:cohort2002                 NA         NA      NA
> colonyother:cohort2003                 NA         NA      NA
> colonyother:cohort2004          1.497e+01  1.697e+03   0.009
> colonyother:cohort2006         -1.707e+01  2.400e+03  -0.007
> colonyother:cohort2007         -7.885e-01  1.772e+00  -0.445
> colonyother:cohort2008                 NA         NA      NA
> colonyother:cohort2009                 NA         NA      NA
> colonyother:cohort2010                 NA         NA      NA
> gender1:colonyother:cohort2002         NA         NA      NA
> gender1:colonyother:cohort2003         NA         NA      NA
> gender1:colonyother:cohort2004 -9.163e-01  2.400e+03   0.000
> gender1:colonyother:cohort2006         NA         NA      NA
> gender1:colonyother:cohort2007 -2.575e+00  2.379e+00  -1.082
> gender1:colonyother:cohort2008         NA         NA      NA
> gender1:colonyother:cohort2009         NA         NA      NA
> gender1:colonyother:cohort2010         NA         NA      NA
>                               Pr(>|z|)
> (Intercept)                       0.994
> gender1                           0.809
> colonyother                       0.905
> cohort2002                        1.000
> cohort2003                        0.994
> cohort2004                        0.994
> cohort2006                        0.994
> cohort2007                        0.994
> cohort2008                        0.995
> cohort2009                        0.994
> cohort2010                        0.995
> gender1:colonyother               0.399
> gender1:cohort2002                0.994
> gender1:cohort2003                0.915
> gender1:cohort2004                0.867
> gender1:cohort2006                0.711
> gender1:cohort2007                0.971
> gender1:cohort2008                0.442
> gender1:cohort2009                0.981
> gender1:cohort2010                   NA
> colonyother:cohort2002               NA
> colonyother:cohort2003               NA
> colonyother:cohort2004            0.993
> colonyother:cohort2006            0.994
> colonyother:cohort2007            0.656
> colonyother:cohort2008               NA
> colonyother:cohort2009               NA
> colonyother:cohort2010               NA
> gender1:colonyother:cohort2002       NA
> gender1:colonyother:cohort2003       NA
> gender1:colonyother:cohort2004    1.000
> gender1:colonyother:cohort2006       NA
> gender1:colonyother:cohort2007    0.279
> gender1:colonyother:cohort2008       NA
> gender1:colonyother:cohort2009       NA
> gender1:colonyother:cohort2010       NA
>
> (Dispersion parameter for binomial family taken to be 1)
>
>    Null deviance: 311.91  on 224  degrees of freedom
> Residual deviance: 271.61  on 201  degrees of freedom
>  (774 observations deleted due to missingness)
> AIC: 319.61
>
> Number of Fisher Scoring iterations: 15
> /
>
> I thought that perhaps keeping the gender as explanatory variable was
> reducing a lot the sample size and it was the matter (I removed it):
>
>
>
> /> model1<-glm(site~colony*cohort,binomial)
> > summary(model1)
>
> Call:
> glm(formula = site ~ colony * cohort, family = binomial)
>
> Deviance Residuals:
>    Min       1Q   Median       3Q      Max
> -1.8683  -0.9712  -0.8757   1.3468   1.7470
>
> Coefficients: (6 not defined because of singularities)
>                        Estimate Std. Error z value Pr(>|z|)
> (Intercept)             -15.5661  1455.3976  -0.011    0.991
> colonyother               0.2544     0.2167   1.174    0.240
> cohort2000               14.4675  1455.3981   0.010    0.992
> cohort2001               16.8188  1455.3978   0.012    0.991
> cohort2002               15.9715  1455.3977   0.011    0.991
> cohort2003               16.6647  1455.3977   0.011    0.991
> cohort2004               17.1194  1455.3976   0.012    0.991
> cohort2006               15.2607  1455.3976   0.010    0.992
> cohort2007               14.9470  1455.3976   0.010    0.992
> cohort2008               15.3837  1455.3976   0.011    0.992
> cohort2009               15.1762  1455.3976   0.010    0.992
> cohort2010               14.8053  1455.3976   0.010    0.992
> colonyother:cohort2000        NA         NA      NA       NA
> colonyother:cohort2001        NA         NA      NA       NA
> colonyother:cohort2002        NA         NA      NA       NA
> colonyother:cohort2003        NA         NA      NA       NA
> colonyother:cohort2004   13.7583   550.0887   0.025    0.980
> colonyother:cohort2006  -15.5151  1455.3976  -0.011    0.991
> colonyother:cohort2007   -0.9163     0.5979  -1.533    0.125
> colonyother:cohort2008        NA         NA      NA       NA
> colonyother:cohort2009   -0.6912     0.5214  -1.326    0.185
> colonyother:cohort2010        NA         NA      NA       NA
>
> (Dispersion parameter for binomial family taken to be 1)
>
>    Null deviance: 1361.4  on 998  degrees of freedom
> Residual deviance: 1267.9  on 983  degrees of freedom
> AIC: 1299.9
>
> Number of Fisher Scoring iterations: 14
>  /
>
> Any comment/suggestion on this?
> Thanks for any help
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Binomial-GLM-chisq-test-or-tp4608941.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Binomial GLM, chisq.test, or?

Reply via email to