[R] clusters in zero-inflated negative binomial models

2012-05-16 Thread Lies Durnez
Dear all,

I want to build a model in R based on animal collection data, that look like 
the following

Nr  Village DistrictSiteSurvey  Species Count
1   AX  A   F   Dry B   0
2   AY  A   V   Wet A   5
3   BX  B   F   Wet B   1
4   BY  B   V   Dry B   0

Each data point shows one collection unit in a certain Village, District, Site, 
and Survey for a certain Species. 'Count' is the number of animals collected in 
that collection unit. It is possible that zero animals are collected in that 
unit because of very low densities, but also because of climatic conditions 
(wind, rain, etc), so we would expect an excess in zeroes. I have tested that 
the data are overdispersed (variance much bigger than mean), so a zero-inflated 
negative binomial model seems the most suitable model in this case. To be sure, 
I will compare the zero-inflated model to the standard binomial model using the 
vuong test. The models will be made for each species separately. For these 
models I can use the glm.nb(), and the and zeroinfl () in the package pscl, 
looking something like this (after selection of the subset B-subset(data, 
Species==B)): 
NB=glm.nb(formula = Count ~ District+Site+Survey, data = B)
ZINB=zeroinfl(formula = Count ~ District+Site+Survey, dist=negbin, data = B)
Vuong(NB,ZINB)
I have tried this and it works very elegantly.

However, the animal collections were only done in 4 districts, and in each 
district 3 villages were chosen (a total of 12 villages). This should be 
included in the design. The package survey allows this for the standard 
negative binomial model, but it seems to me that it is not possible for the 
zero-inflated NB. So, my question is two-fold: 
1. Is a zero-inflated NB possible in the survey package. If yes, how? 
2. If no, how can I build a zero-inflated NB model that takes into account the 
clustering of the observations (animal counts) in villages and the clustering 
of the villages in districts. 

Thank you very much for the help.
ITM Colloquium

Antwerp, Belgium
3-5 December 2012

www.itg.be/colloq2012

Disclaimer: Http://www.itg.be/disclaimer

Directions to our location(s): http://g.co/maps/ua89b

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] clusters in zero-inflated negative binomial models

2012-05-16 Thread Ben Bolker
Lies Durnez ldurnez at itg.be writes:

 I want to build a model in R based on animal collection data, that look like
the following
 
 NrVillage DistrictSiteSurvey  Species Count
 1 AX  A   F   Dry B   0
 2 AY  A   V   Wet A   5
 3 BX  B   F   Wet B   1
 4 BY  B   V   Dry B   0

 
 Each data point shows one collection unit in a certain Village,
 District, Site, and Survey for a certain Species. 'Count' is the
 number of animals collected in that collection unit. It is possible
 that zero animals are collected in that unit because of very low
 densities, but also because of climatic conditions (wind, rain,
 etc), so we would expect an excess in zeroes. I have tested that the
 data are overdispersed (variance much bigger than mean), so a
 zero-inflated negative binomial model seems the most suitable model
 in this case.

 [snip snip snip]

 However, the animal collections were only done in 4 districts, and
 in each district 3 villages were chosen (a total of 12
 villages). This should be included in the design. The package survey
 allows this for the standard negative binomial model, but it seems
 to me that it is not possible for the zero-inflated NB. So, my
 question is two-fold: 1. Is a zero-inflated NB possible in the
 survey package. If yes, how?  2. If no, how can I build a
 zero-inflated NB model that takes into account the clustering of the
 observations (animal counts) in villages and the clustering of the
 villages in districts.

  Treating villages and districts as random effects (clusters)
basically puts you in the domain of generalized linear mixed models.
You can use the glmmADMB package to fit zero-inflated, mixed negative
binomial models.  You can also use the MCMCglmm package to fit
lognormal-Poisson models, which are another form of overdispersed
count data (it depends how strongly you require that the actual model
be NB as opposed to just a reasonable model for overdispersed count
data).

4 districts is not very many for estimating an among-district variance 
(which is basically what you are doing when you fit a clustered/
mixed model), so I might suggest using district as a fixed effect,
but then using district:village (i.e. the interaction between district
and village, or village alone if they are uniquely labeled).

  http://glmm.wikidot.com/faq may be useful.

  I would suggest that you send follow-ups to the
r-sig-mixed-models at r-project.org mailing list.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.