[R-sig-eco] adonis and negative F-values
Dear all, I used adonis to perform a test of the pairwise site dissimilarity indices proposed by Baselga (2010, 2012) in the package betapart. I am concerned about my results because I get some negative F-values. I read in another post that this may happen because of the presence of negative eigenvalues. However I was wondering if this does invalidate the results, or if they are still interpretable in some way. Moreover in case the results are still valid, do you think that providing a result table containing negative F-values will be considered for publication or be an argument of refusal? I may use distance-based RDA with the cailliez correction instead, would it be a good alternative to adonis for testing the effect of a three-level factor on the dissimilarity measures? Best wishes, Valerie Coudrain ___ CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr http://sports.voila.fr/football/can/ ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] proportion data with many zeros
Thank you Liz, I don't know tweedie, I'll have a look at it, but I have indeed some high values. I know about the problems linked to the arcsine transformation. I won't consider it anyway. I'd like to use either the raw values of pollen grain counts or a logistic quasibinomial model. Best, Valérie Message du 02/02/13 à 20h47 De : Liz Pryde A : v_coudr...@voila.fr Copie à : Cade Brian , r-sig-ecology@r-project.org Objet : Re: [R-sig-eco] proportion data with many zeros Have you plotted the raw data to have a look at the distribution? You could try another exponential family distribution like tweedie that has a mass at zero but is otherwise similar to poisson/gamma - so you're directly modeling the zeroes. It won't work if you have a lot of high values though. Proportions are tricky. Have a read of the Warton paper (2012/11?) the arcsine is asinine. Liz On 02/02/2013, at 6:34 PM, v_coudr...@voila.fr wrote: Thank you very much for this suggestion. In fact I reconsidered my question and I am not sure that zero-inflated model is what I need. If I understood it properly, a zero-inflated model is best suited when we don't know if zero values are true or false absences (right?). In my case all zero values are assumed to be real absence and are therefore informative. However, fitting quasipoisson on raw counts or quasibinomial on proportion gives me awful distributions of residuals and meaningless results. Valérie Message du 01/02/13 à 17h22 De : Cade, Brian A : v_coudr...@voila.fr Copie à : r-sig-ecology@r-project.org Objet : Re: [R-sig-eco] proportion data with many zeros For a fully parametric approach, you might want to use of zero-inflated beta distribution (e.g., as available in gamlss package), which is designed for zero-inflated proportions. Or for a semi-parametric approach, you could estimated a sequence of quantile regression estimates (e.g., in package quantreg), where some interval (hopefully not to large) of the quantiles will be uninformative because they are massed at the zero values. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 On Fri, Feb 1, 2013 at 1:30 AM, wrote: Dear all, I am trying to test how the proportion of pollen of different plants found in the brood cells of a wild bee changes over time. I conducted 4 sampling sessions (thus time is a factor with 4 levels) and collected several pollen samples for each time point (300 pollen grains counted for each sample). I thought about applying a quasi-binomial glm: y = cbind(total pollen - pollen of plant X, pollen of plant X) glm(y~time, family=quasibinomial) The problem is that I have a lot of zero value, because the pollen of some plants only occurred rarely or very clumped in time. I thought about applying a zero-inflated model, but I have never used it and I am not sure if it is suitable for proportion data. Additionally I wondered if I have to consider the fact that I don't have the same number of pollen sample for each date, which makes my design unbalanced. Thank you in advance for advice. Best wishes Valérie ___ CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr http://sports.voila.fr/football/can/ ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr http://sports.voila.fr/football/can/ ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr http://sports.voila.fr/football/can/ ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Very large dispersion parameter in a negative binomial model
Dear list members, I am fitting a negative binomial model but I get a very large dispersion parameter. Why is that? quine.nb2 - glm.nb(Pt ~ Agua + Dist10, data = Abundancia) summary(quine.nb2) Deviance Residuals: Min1QMedian3Q Max -2.20911 -0.67157 0.04411 0.35695 1.65524 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -0.883540.43982 -2.009 0.0445 * Agua[T.negra] 0.893190.43727 2.043 0.0411 * Dist10 0.394160.04387 8.984 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(67552.88) family taken to be 1) Null deviance: 128.567 on 15 degrees of freedom Residual deviance: 17.005 on 13 degrees of freedom AIC: 67.058 Number of Fisher Scoring iterations: 1 Theta: 67553 Std. Err.: 1428117 Warning while fitting theta: iteration limit reached 2 x log-likelihood: -59.058 Best, Manuel -- *Manuel Spínola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Very large dispersion parameter in a negative binomial model
Hi Manuel, This means that your data is closer to Poisson. Here is an example where I simulate Poisson data and try to fit the NB distribution. I get behavior from glm.nb that is similar to your results (large dispersion parameter and warning about iteration limit reached). x=rep(1:5, 10) y=rpois(n=length(x), lambda=exp(x*.5+1)) dat=data.frame(x=x,y=y) library(MASS) m1=glm.nb(y~x, dat) m2=glm(y~x, dat, family=poisson) Try using glm instead of glm.nb cheers, Mollie Mollie Brooks Postdoctoral Researcher, Ponciano Lab Biology Department, University of Florida http://people.biology.ufl.edu/mbrooks On 3 Feb 2013, at 10:16 AM, Manuel Spínola wrote: Dear list members, I am fitting a negative binomial model but I get a very large dispersion parameter. Why is that? quine.nb2 - glm.nb(Pt ~ Agua + Dist10, data = Abundancia) summary(quine.nb2) Deviance Residuals: Min1QMedian3Q Max -2.20911 -0.67157 0.04411 0.35695 1.65524 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -0.883540.43982 -2.009 0.0445 * Agua[T.negra] 0.893190.43727 2.043 0.0411 * Dist10 0.394160.04387 8.984 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(67552.88) family taken to be 1) Null deviance: 128.567 on 15 degrees of freedom Residual deviance: 17.005 on 13 degrees of freedom AIC: 67.058 Number of Fisher Scoring iterations: 1 Theta: 67553 Std. Err.: 1428117 Warning while fitting theta: iteration limit reached 2 x log-likelihood: -59.058 Best, Manuel -- *Manuel Spínola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] proportion data with many zeros
Hi Valerie, The best advice I was ever given with regards to distribution was to choose the one with the best fit i.e. no pattern in the residuals. The 2 things to think about when fitting a GLM are the type of data you've collected (binomial, counts etc) so that you can get an idea of which link will linearise your model correctly and return realistic results (non negative etc). The second is to think about the mean-variance relationship. This is what will generally show up in the residuals. Gaussian assumes no relationship (constant) but most proportion/abundance measures will have a variance which varies in some way with the mean. Try plotting your means against your variances and have a look at the share of the distribution of your raw data. Then experiment with some suitable exponential family distributions and see which residuals have no pattern. I think you're correct in not modeling the zeroes as a hurdle - as they are not 'unknowns'. Proportion data is very tricky - I've been grappling with percent cover data for a while. Tweedie worked well for me for measures where cover values were mid to low, but not well when they were close to 100%. If i were you, i'd consider changing the way you use the data to make it simpler. Perhaps just analyse each type of pollen individually over the time periods. I assume each time period is the same for the samples and I think n=300 for each of the samples taken? So why not just try a quasi poisson (or negative binomial) and a tweedie GLM for each type of pollen separately vs time and see which has better residuals. It's much easier to treat these as counts - and no need to do proportions if the n is the same for all. Then you can get a significance value for the abundance of each pollen type with each bee at each time period. It is really the same as finding out the relative proportions. Package tweedie on R works pretty much the same as any GLM. You just need a little but of code (in help files) to estimate an alpha (shape) parameter for each set of values. It should lie between 1-2. If not, your data is prob not suited. Let me know if you need any more help. Liz On 04/02/2013, at 2:10 AM, v_coudr...@voila.fr wrote: Thank you Liz, I don't know tweedie, I'll have a look at it, but I have indeed some high values. I know about the problems linked to the arcsine transformation. I won't consider it anyway. I'd like to use either the raw values of pollen grain counts or a logistic quasibinomial model. Best, Valérie Message du 02/02/13 à 20h47 De : Liz Pryde A : v_coudr...@voila.fr Copie à : Cade Brian , r-sig-ecology@r-project.org Objet : Re: [R-sig-eco] proportion data with many zeros Have you plotted the raw data to have a look at the distribution? You could try another exponential family distribution like tweedie that has a mass at zero but is otherwise similar to poisson/gamma - so you're directly modeling the zeroes. It won't work if you have a lot of high values though. Proportions are tricky. Have a read of the Warton paper (2012/11?) the arcsine is asinine. Liz On 02/02/2013, at 6:34 PM, v_coudr...@voila.fr wrote: Thank you very much for this suggestion. In fact I reconsidered my question and I am not sure that zero-inflated model is what I need. If I understood it properly, a zero-inflated model is best suited when we don't know if zero values are true or false absences (right?). In my case all zero values are assumed to be real absence and are therefore informative. However, fitting quasipoisson on raw counts or quasibinomial on proportion gives me awful distributions of residuals and meaningless results. Valérie Message du 01/02/13 à 17h22 De : Cade, Brian A : v_coudr...@voila.fr Copie à : r-sig-ecology@r-project.org Objet : Re: [R-sig-eco] proportion data with many zeros For a fully parametric approach, you might want to use of zero-inflated beta distribution (e.g., as available in gamlss package), which is designed for zero-inflated proportions. Or for a semi-parametric approach, you could estimated a sequence of quantile regression estimates (e.g., in package quantreg), where some interval (hopefully not to large) of the quantiles will be uninformative because they are massed at the zero values. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_c...@usgs.gov tel: 970 226-9326 On Fri, Feb 1, 2013 at 1:30 AM, wrote: Dear all, I am trying to test how the proportion of pollen of different plants found in the brood cells of a wild bee changes over time. I conducted 4 sampling sessions (thus time is a factor with 4 levels) and collected several pollen samples for each time point (300 pollen grains counted for each sample). I thought about applying a quasi-binomial glm: y = cbind(total pollen - pollen of plant
Re: [R-sig-eco] Very large dispersion parameter in a negative binomial model
Thank you very much Mollie. Best, Manuel 2013/2/3 Mollie Brooks mbro...@ufl.edu Hi Manuel, This means that your data is closer to Poisson. Here is an example where I simulate Poisson data and try to fit the NB distribution. I get behavior from glm.nb that is similar to your results (large dispersion parameter and warning about iteration limit reached). x=rep(1:5, 10) y=rpois(n=length(x), lambda=exp(x*.5+1)) dat=data.frame(x=x,y=y) library(MASS) m1=glm.nb(y~x, dat) m2=glm(y~x, dat, family=poisson) Try using glm instead of glm.nb cheers, Mollie Mollie Brooks Postdoctoral Researcher, Ponciano Lab Biology Department, University of Florida http://people.biology.ufl.edu/mbrooks On 3 Feb 2013, at 10:16 AM, Manuel Spínola wrote: Dear list members, I am fitting a negative binomial model but I get a very large dispersion parameter. Why is that? quine.nb2 - glm.nb(Pt ~ Agua + Dist10, data = Abundancia) summary(quine.nb2) Deviance Residuals: Min1QMedian3Q Max -2.20911 -0.67157 0.04411 0.35695 1.65524 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -0.883540.43982 -2.009 0.0445 * Agua[T.negra] 0.893190.43727 2.043 0.0411 * Dist10 0.394160.04387 8.984 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(67552.88) family taken to be 1) Null deviance: 128.567 on 15 degrees of freedom Residual deviance: 17.005 on 13 degrees of freedom AIC: 67.058 Number of Fisher Scoring iterations: 1 Theta: 67553 Std. Err.: 1428117 Warning while fitting theta: iteration limit reached 2 x log-likelihood: -59.058 Best, Manuel -- *Manuel Spínola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- *Manuel Spínola, Ph.D.* Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA mspin...@una.ac.cr mspinol...@gmail.com Teléfono: (506) 2277-3598 Fax: (506) 2237-7036 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/ Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/ [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] Adonis and Random Effects
Hello List, Is adonis capable of modeling random effects? I'm analyzing the impact of a treatment on the microbial community in a split-plot design (2 treatments per plot, 4 plots per grassland, 3 grasslands total). I would like to quantify how much of the variance is due to the Treatment versus the Grassland. It seems like Grassland should be a random effect, since there are thousands of grasslands, and I'm only looking at 3. I have tried to use the notation that works with lme4, and it's not working for me (see below for formula and error messages). If adonis can't do random effects, are there any alternatives? Or, considering my goal, are there any other programs I should look into? Any suggestions would be highly appreciated! Thanks for your help, Erin Here's what I think I should run: adonis(formula = community_distance_matrix ~ Treatment + (1|Grassland) + (1|GrasslandPlot), strata = GrasslandPlot) Here are my factors: 'data.frame': 24 obs. of 4 variables: $ Treatment: Factor w/ 2 levels T1,T2: 1 1 1 1 1 2 2 2 1 1 ... $ Grassland: Factor w/ 3 levels G1,G2,G3: 3 3 1 1 1 2 2 1 2 2 ... $ Plot : Factor w/ 4 levels P1,P2,P3,P4: 1 2 2 3 4 1 3 2 1 2 ... $ GrasslandPlot: Factor w/ 12 levels G1:P1,G1:P2,G1:P3..: 9 10 2 3 4 5 7 2 5 6 ... And here's the error message: Error in `contrasts-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels In addition: Warning messages: 1: In Ops.factor(1, Grassland) : | not meaningful for factors 2: In Ops.factor(1, GrasslandPlot) : | not meaningful for factors ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology