Without going deeply into your analysis, 2 comments: 1) Use the anova command to test two nested models using: anova(model1, model2, test="Chisq")
2) glm's are non-trivial models (at least to me), be sure to google for some tutorials in order to understand what you are looking at... Cheers, Tal ----------------Contact Details:------------------------------------------------------- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Fri, May 4, 2012 at 6:38 PM, lincoln <misen...@hotmail.com> wrote: > Hi, > > I have a data set with 999 observations, for each of them I have data on > four variables: > site, colony, gender (quite a few NA values), and cohort. > > This is how the data set looks like: > > str(dispersal) > 'data.frame': 999 obs. of 4 variables: > $ site : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 2 2 ... > $ gender: Factor w/ 2 levels "0","1": NA NA 2 1 2 NA 1 2 2 NA ... > $ colony: Factor w/ 2 levels "main","other": 2 2 2 2 2 2 2 2 2 2 ... > $ cohort: Factor w/ 11 levels "1996","2000",..: 8 8 8 8 8 8 8 8 8 6 ... > > Now, I want to estimate if sites 1 and site 2 differ on some of the other > variables. For instance there are relatively more males in site 1 with > respect to site 2, more individuals of the main colony in site 2 with > respect to site 1 and this sort of things. > > I thought I might do a binomial GLM considering as response variable the > site, I tried to run the more general model to have a look to > overdispersion > but I believe there is something wrong even before worrying about > overdispersion. I know (I did a chisq.test) that cohort2004 is very > diversly > represented between the two sites but it is not reflected in the results of > GLM. Here there are the results of chisq.test and the GLM: > > 1) > /> > > age_cohort<-as.table(rbind(c(142,95,46,33,14,59,18,12,7,1,0),c(258,144,54,70,20,11,6,8,2,3,1))) > > dimnames(age_cohort)<-list(site=c("M","D"), > + > cohorts=c(2010,2009,2008,2007,2006,2004,2003,2002,2001,2000,1996)) > > age_cohort > cohorts > site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996 > M 142 95 46 33 14 59 18 12 7 1 0 > D 258 144 54 70 20 11 6 8 2 3 1 > > (Xsqagec <- chisq.test(age_cohort)) # Prints test summary > > Pearson's Chi-squared test > > data: age_cohort > X-squared = 82.6016, df = 10, p-value = 1.549e-13 > > Mensajes de aviso perdidos > In chisq.test(age_cohort) : Chi-squared approximation may be incorrect > > Xsqagec$observed # observed counts > cohorts > site 2010 2009 2008 2007 2006 2004 2003 2002 2001 2000 1996 > M 142 95 46 33 14 59 18 12 7 1 0 > D 258 144 54 70 20 11 6 8 2 3 1 > > Xsqagec$expected # expected counts under the null > cohorts > site 2010 2009 2008 2007 2006 2004 2003 > M 170.1195 101.6464 42.52988 43.80578 14.46016 29.77092 10.20717 > D 229.8805 137.3536 57.47012 59.19422 19.53984 40.22908 13.79283 > cohorts > site 2002 2001 2000 1996 > M 8.505976 3.827689 1.701195 0.4252988 > D 11.494024 5.172311 2.298805 0.5747012 > > Xsqagec$residuals # Pearson residuals > cohorts > site 2010 2009 2008 2007 2006 > M -2.1559111 -0.6592367 0.5321050 -1.6326395 -0.1210101 > D 1.8546283 0.5671101 -0.4577448 1.4044825 0.1040993 > cohorts > site 2004 2003 2002 2001 2000 > M 5.3569686 2.4391720 1.1980192 1.6214643 -0.5376032 > D -4.6083465 -2.0983042 -1.0305993 -1.3948690 0.4624746 > cohorts > site 1996 > M -0.6521494 > D 0.5610132 > > Xsqagec$stdres # standardized residuals > cohorts > site 2010 2009 2008 2007 2006 > M -3.6665549 -0.9962228 0.7397057 -2.2733888 -0.1623984 > D 3.6665549 0.9962228 -0.7397057 2.2733888 0.1623984 > cohorts > site 2004 2003 2002 2001 2000 > M 7.3264142 3.2566808 1.5962909 2.1485311 -0.7105713 > D -7.3264142 -3.2566808 -1.5962909 -2.1485311 0.7105713 > cohorts > site 1996 > M -0.8606814 > D 0.8606814 > / > > > 2) > /> model1<-glm(site~gender*colony*cohort,binomial) > > summary(model1) > > Call: > glm(formula = site ~ gender * colony * cohort, family = binomial) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.84648 -0.96954 -0.00036 1.11269 2.03933 > > Coefficients: (12 not defined because of singularities) > Estimate Std. Error z value > (Intercept) -1.657e+01 2.400e+03 -0.007 > gender1 -2.231e-01 9.220e-01 -0.242 > colonyother 9.531e-02 8.006e-01 0.119 > cohort2002 1.717e-08 3.393e+03 0.000 > cohort2003 1.766e+01 2.400e+03 0.007 > cohort2004 1.807e+01 2.400e+03 0.008 > cohort2006 1.697e+01 2.400e+03 0.007 > cohort2007 1.726e+01 2.400e+03 0.007 > cohort2008 1.606e+01 2.400e+03 0.007 > cohort2009 1.657e+01 2.400e+03 0.007 > cohort2010 1.587e+01 2.400e+03 0.007 > gender1:colonyother 9.163e-01 1.087e+00 0.843 > gender1:cohort2002 1.719e+01 2.400e+03 0.007 > gender1:cohort2003 -1.823e-01 1.713e+00 -0.106 > gender1:cohort2004 2.231e-01 1.329e+00 0.168 > gender1:cohort2006 -5.878e-01 1.586e+00 -0.371 > gender1:cohort2007 -6.454e-02 1.784e+00 -0.036 > gender1:cohort2008 8.881e-01 1.156e+00 0.768 > gender1:cohort2009 -2.817e-02 1.199e+00 -0.023 > gender1:cohort2010 NA NA NA > colonyother:cohort2002 NA NA NA > colonyother:cohort2003 NA NA NA > colonyother:cohort2004 1.497e+01 1.697e+03 0.009 > colonyother:cohort2006 -1.707e+01 2.400e+03 -0.007 > colonyother:cohort2007 -7.885e-01 1.772e+00 -0.445 > colonyother:cohort2008 NA NA NA > colonyother:cohort2009 NA NA NA > colonyother:cohort2010 NA NA NA > gender1:colonyother:cohort2002 NA NA NA > gender1:colonyother:cohort2003 NA NA NA > gender1:colonyother:cohort2004 -9.163e-01 2.400e+03 0.000 > gender1:colonyother:cohort2006 NA NA NA > gender1:colonyother:cohort2007 -2.575e+00 2.379e+00 -1.082 > gender1:colonyother:cohort2008 NA NA NA > gender1:colonyother:cohort2009 NA NA NA > gender1:colonyother:cohort2010 NA NA NA > Pr(>|z|) > (Intercept) 0.994 > gender1 0.809 > colonyother 0.905 > cohort2002 1.000 > cohort2003 0.994 > cohort2004 0.994 > cohort2006 0.994 > cohort2007 0.994 > cohort2008 0.995 > cohort2009 0.994 > cohort2010 0.995 > gender1:colonyother 0.399 > gender1:cohort2002 0.994 > gender1:cohort2003 0.915 > gender1:cohort2004 0.867 > gender1:cohort2006 0.711 > gender1:cohort2007 0.971 > gender1:cohort2008 0.442 > gender1:cohort2009 0.981 > gender1:cohort2010 NA > colonyother:cohort2002 NA > colonyother:cohort2003 NA > colonyother:cohort2004 0.993 > colonyother:cohort2006 0.994 > colonyother:cohort2007 0.656 > colonyother:cohort2008 NA > colonyother:cohort2009 NA > colonyother:cohort2010 NA > gender1:colonyother:cohort2002 NA > gender1:colonyother:cohort2003 NA > gender1:colonyother:cohort2004 1.000 > gender1:colonyother:cohort2006 NA > gender1:colonyother:cohort2007 0.279 > gender1:colonyother:cohort2008 NA > gender1:colonyother:cohort2009 NA > gender1:colonyother:cohort2010 NA > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 311.91 on 224 degrees of freedom > Residual deviance: 271.61 on 201 degrees of freedom > (774 observations deleted due to missingness) > AIC: 319.61 > > Number of Fisher Scoring iterations: 15 > / > > I thought that perhaps keeping the gender as explanatory variable was > reducing a lot the sample size and it was the matter (I removed it): > > > > /> model1<-glm(site~colony*cohort,binomial) > > summary(model1) > > Call: > glm(formula = site ~ colony * cohort, family = binomial) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.8683 -0.9712 -0.8757 1.3468 1.7470 > > Coefficients: (6 not defined because of singularities) > Estimate Std. Error z value Pr(>|z|) > (Intercept) -15.5661 1455.3976 -0.011 0.991 > colonyother 0.2544 0.2167 1.174 0.240 > cohort2000 14.4675 1455.3981 0.010 0.992 > cohort2001 16.8188 1455.3978 0.012 0.991 > cohort2002 15.9715 1455.3977 0.011 0.991 > cohort2003 16.6647 1455.3977 0.011 0.991 > cohort2004 17.1194 1455.3976 0.012 0.991 > cohort2006 15.2607 1455.3976 0.010 0.992 > cohort2007 14.9470 1455.3976 0.010 0.992 > cohort2008 15.3837 1455.3976 0.011 0.992 > cohort2009 15.1762 1455.3976 0.010 0.992 > cohort2010 14.8053 1455.3976 0.010 0.992 > colonyother:cohort2000 NA NA NA NA > colonyother:cohort2001 NA NA NA NA > colonyother:cohort2002 NA NA NA NA > colonyother:cohort2003 NA NA NA NA > colonyother:cohort2004 13.7583 550.0887 0.025 0.980 > colonyother:cohort2006 -15.5151 1455.3976 -0.011 0.991 > colonyother:cohort2007 -0.9163 0.5979 -1.533 0.125 > colonyother:cohort2008 NA NA NA NA > colonyother:cohort2009 -0.6912 0.5214 -1.326 0.185 > colonyother:cohort2010 NA NA NA NA > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 1361.4 on 998 degrees of freedom > Residual deviance: 1267.9 on 983 degrees of freedom > AIC: 1299.9 > > Number of Fisher Scoring iterations: 14 > / > > Any comment/suggestion on this? > Thanks for any help > > -- > View this message in context: > http://r.789695.n4.nabble.com/Binomial-GLM-chisq-test-or-tp4608941.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.