[R] glm and percentage data with many zero values

Christian Kamenik Thu, 20 Jan 2005 08:30:14 -0800

Dear all,

I am interested in correctly testing effects of continuous environmental variables and ordered factors on bacterial abundance. Bacterial abundance is derived from counts and expressed as percentage. My problem is that the abundance data contain many zero values: Bacteria <- c(2.23,0,0.03,0.71,2.34,0,0.2,0.2,0.02,2.07,0.85,0.12,0,0.59,0.02,2.3,0.29,0.39,1.32,0.07,0.52,1.2,0,0.85,1.09,0,0.5,1.4,0.08,0.11,0.05,0.17,0.31,0,0.12,0,0.99,1.11,1.78,0,0,0,2.33,0.07,0.66,1.03,0.15,0.15,0.59,0,0.03,0.16,2.86,0.2,1.66,0.12,0.09,0.01,0,0.82,0.31,0.2,0.48,0.15)

First I tried transforming the data (e.g., logit) but because of the zeros I was not satisfied. Next I converted the percentages into integer values by round(Bacteria*10) or ceiling(Bacteria*10) and calculated a glm with a Poisson error structure; however, I am not very happy with this approach because it changes the original percentage data substantially (e.g., 0.03 becomes either 0 or 1). The same is true for converting the percentages into factors and calculating a multinomial or proportional-odds model (anyway, I do not know if this would be a meaningful approach). I was searching the web and the best answer I could get was http://www.biostat.wustl.edu/archives/html/s-news/1998-12/msg00010.html in which several persons suggested quasi-likelihood. Would it be reasonable to use a glm with quasipoisson? If yes, how I can I find the appropriate variance function? Any other suggestions?

Many thanks in advance, Christian


================================


Christian Kamenik
Institute of Plant Sciences
University of Bern
Altenbergrain 21
3013 Bern
Switzerland

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] glm and percentage data with many zero values

Reply via email to