[R-sig-eco] adonis and negative F-values

2013-02-03 Thread v_coudrain
Dear all,
I used adonis to perform a test of the pairwise site dissimilarity indices 
proposed by Baselga (2010, 2012) in the package betapart. I am concerned about 
my results 
because I get some negative F-values. I read in another post that this may 
happen because of the presence of negative eigenvalues. However I was wondering 
if 
this does invalidate the results, or if they are still interpretable in some 
way. Moreover in case the results are still valid, do you think that providing 
a result table 
containing negative F-values will be considered for publication or be an 
argument of refusal? I may use distance-based RDA with the cailliez correction 
instead, 
would it be a good alternative to adonis for testing the effect of a 
three-level factor on the dissimilarity measures?

Best wishes,
Valerie Coudrain
___
CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr 
http://sports.voila.fr/football/can/

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] proportion data with many zeros

2013-02-03 Thread v_coudrain
Thank you Liz, 
I don't know tweedie, I'll have a look at it, but I have indeed some high 
values. I know about the problems linked to the arcsine transformation. I won't 
consider it 
anyway. I'd like to use either the raw values of pollen grain counts or a 
logistic quasibinomial model. 
Best,
Valérie


 Message du 02/02/13 à 20h47
 De : Liz Pryde 
 A : v_coudr...@voila.fr 
 Copie à : Cade Brian , r-sig-ecology@r-project.org 
 Objet : Re: [R-sig-eco] proportion data with many zeros
 
 Have you plotted the raw data to have a look at the distribution?
 You could try another exponential family distribution like tweedie that has a 
 mass at zero but is otherwise similar to poisson/gamma - so you're directly 
modeling the zeroes. It won't work if you have a lot of high values though. 
 Proportions are tricky. Have a read of the Warton paper (2012/11?) the 
 arcsine is asinine.
 
 Liz
 
 
 
 On 02/02/2013, at 6:34 PM, v_coudr...@voila.fr wrote:
 
  Thank you very much for this suggestion. In fact I reconsidered my question 
  and I am not sure that zero-inflated model is what I need. If I understood 
  it 
properly, 
  a zero-inflated model is best suited when we don't know if zero values are 
  true or false absences (right?). In my case all zero values are assumed to 
  be real 
  absence and are therefore informative. However, fitting quasipoisson on raw 
  counts or quasibinomial on proportion gives me awful distributions of 
  residuals 
and 
  meaningless results. 
  
  Valérie
  
  
  Message du 01/02/13 à 17h22
  De : Cade, Brian 
  A : v_coudr...@voila.fr
  Copie à : r-sig-ecology@r-project.org
  Objet : Re: [R-sig-eco] proportion data with many zeros
  
  For a fully parametric approach, you might want to use of zero-inflated
  beta distribution (e.g., as available in gamlss package), which is designed
  for zero-inflated proportions. Or for a semi-parametric approach, you
  could estimated a sequence of quantile regression estimates (e.g., in
  package quantreg), where some interval (hopefully not to large) of the
  quantiles will be uninformative because they are massed at the zero values.
  
  Brian
  
  Brian S. Cade, PhD
  
  U. S. Geological Survey
  Fort Collins Science Center
  2150 Centre Ave., Bldg. C
  Fort Collins, CO 80526-8818
  
  email: brian_c...@usgs.gov
  tel: 970 226-9326
  
  
  
  On Fri, Feb 1, 2013 at 1:30 AM, wrote:
  
  Dear all, I am trying to test how the proportion of pollen of different
  plants found in the brood cells of a wild bee changes over time. I
  conducted 4 sampling sessions
  (thus time is a factor with 4 levels) and collected several pollen samples
  for each time point (300 pollen grains counted for each sample). I thought
  about applying a
  quasi-binomial glm:
  
  y = cbind(total pollen - pollen of plant X, pollen of plant X)
  
  glm(y~time, family=quasibinomial)
  
  The problem is that I have a lot of zero value, because the pollen of some
  plants only occurred rarely or very clumped in time. I thought about
  applying a zero-inflated
  model, but I have never used it and I am not sure if it is suitable for
  proportion data. Additionally I wondered if I have to consider the fact
  that I don't have the same
  number of pollen sample for each date, which makes my design unbalanced.
  Thank you in advance for advice.
  
  Best wishes
  Valérie
  ___
  CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr
  http://sports.voila.fr/football/can/
  
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
  
  ___
  CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr 
  http://sports.voila.fr/football/can/
  
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 

___
CAN 2013 : résultats et matchs en direct à suivre sur Voila.fr 
http://sports.voila.fr/football/can/

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Very large dispersion parameter in a negative binomial model

2013-02-03 Thread Manuel Spínola
Dear list members,

I am fitting a negative binomial model but I get a very large dispersion
parameter.  Why is that?

quine.nb2 - glm.nb(Pt ~ Agua + Dist10, data = Abundancia)

summary(quine.nb2)

Deviance Residuals:
 Min1QMedian3Q   Max
-2.20911  -0.67157   0.04411   0.35695   1.65524

Coefficients:
  Estimate Std. Error z value Pr(|z|)
(Intercept)   -0.883540.43982  -2.009   0.0445 *
Agua[T.negra]  0.893190.43727   2.043   0.0411 *
Dist10 0.394160.04387   8.984   2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(67552.88) family taken to be 1)

Null deviance: 128.567  on 15  degrees of freedom
Residual deviance:  17.005  on 13  degrees of freedom
AIC: 67.058

Number of Fisher Scoring iterations: 1


  Theta:  67553
  Std. Err.:  1428117
Warning while fitting theta: iteration limit reached

 2 x log-likelihood:  -59.058


Best,

Manuel

-- 
*Manuel Spínola, Ph.D.*
Instituto Internacional en Conservación y Manejo de Vida Silvestre
Universidad Nacional
Apartado 1350-3000
Heredia
COSTA RICA
mspin...@una.ac.cr
mspinol...@gmail.com
Teléfono: (506) 2277-3598
Fax: (506) 2237-7036
Personal website: Lobito de río https://sites.google.com/site/lobitoderio/
Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Very large dispersion parameter in a negative binomial model

2013-02-03 Thread Mollie Brooks
Hi Manuel,
This means that your data is closer to Poisson. 
Here is an example where I simulate Poisson data and try to fit the NB 
distribution. I get behavior from glm.nb that is similar to your results (large 
dispersion parameter and warning about iteration limit reached).

x=rep(1:5, 10)
y=rpois(n=length(x), lambda=exp(x*.5+1))
dat=data.frame(x=x,y=y)
library(MASS)
m1=glm.nb(y~x, dat)
m2=glm(y~x, dat, family=poisson)

Try using glm instead of glm.nb

cheers,
Mollie

Mollie Brooks
Postdoctoral Researcher, Ponciano Lab
Biology Department, University of Florida
http://people.biology.ufl.edu/mbrooks


On 3 Feb 2013, at 10:16 AM, Manuel Spínola wrote:

 Dear list members,
 
 I am fitting a negative binomial model but I get a very large dispersion
 parameter.  Why is that?
 
 quine.nb2 - glm.nb(Pt ~ Agua + Dist10, data = Abundancia)
 
 summary(quine.nb2)
 
 Deviance Residuals:
 Min1QMedian3Q   Max
 -2.20911  -0.67157   0.04411   0.35695   1.65524
 
 Coefficients:
  Estimate Std. Error z value Pr(|z|)
 (Intercept)   -0.883540.43982  -2.009   0.0445 *
 Agua[T.negra]  0.893190.43727   2.043   0.0411 *
 Dist10 0.394160.04387   8.984   2e-16 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 (Dispersion parameter for Negative Binomial(67552.88) family taken to be 1)
 
Null deviance: 128.567  on 15  degrees of freedom
 Residual deviance:  17.005  on 13  degrees of freedom
 AIC: 67.058
 
 Number of Fisher Scoring iterations: 1
 
 
  Theta:  67553
  Std. Err.:  1428117
 Warning while fitting theta: iteration limit reached
 
 2 x log-likelihood:  -59.058
 
 
 Best,
 
 Manuel
 
 -- 
 *Manuel Spínola, Ph.D.*
 Instituto Internacional en Conservación y Manejo de Vida Silvestre
 Universidad Nacional
 Apartado 1350-3000
 Heredia
 COSTA RICA
 mspin...@una.ac.cr
 mspinol...@gmail.com
 Teléfono: (506) 2277-3598
 Fax: (506) 2237-7036
 Personal website: Lobito de río https://sites.google.com/site/lobitoderio/
 Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] proportion data with many zeros

2013-02-03 Thread Liz Pryde
Hi Valerie,
The best advice I was ever given with regards to distribution was to choose the 
one with the best fit i.e. no pattern in the residuals.
The 2 things to think about when fitting a GLM are the type of data you've 
collected (binomial, counts etc) so that you can get an idea of which link will 
linearise your model correctly and return realistic results (non negative etc).
The second is to think about the mean-variance relationship. This is what will 
generally show up in the residuals. Gaussian assumes no relationship (constant) 
but most proportion/abundance measures will have a variance which varies in 
some way with the mean. Try plotting your means against your variances and have 
a look at the share of the distribution of your raw data.  Then experiment with 
some suitable exponential family distributions and see which residuals have no 
pattern.

I think you're correct in not modeling the zeroes as a hurdle - as they are not 
'unknowns'. 
Proportion data is very tricky - I've been grappling with percent cover data 
for a while. Tweedie worked well for me for measures where cover values were 
mid to low, but not well when they were close to 100%.
If i were you, i'd consider changing the way you use the data to make it 
simpler. Perhaps just analyse each type of pollen individually over the time 
periods. I assume each time period is the same for the samples and I think 
n=300 for each of the samples taken?

So why not just try a quasi poisson (or negative binomial) and a tweedie GLM 
for each type of pollen separately vs time and see which has better residuals. 
It's much easier to treat these as counts - and no need to do proportions if 
the n is the same for all.
Then you can get a significance value for the abundance of each pollen type 
with each bee at each time period. It is really the same as finding out the 
relative proportions. 

Package tweedie on R works pretty much the same as any GLM. You just need a 
little but of code (in help files) to estimate an alpha (shape) parameter for 
each set of values. It should lie between 1-2. If not, your data is prob not 
suited.

Let me know if you need any more help.
Liz




On 04/02/2013, at 2:10 AM, v_coudr...@voila.fr wrote:

 Thank you Liz, 
 I don't know tweedie, I'll have a look at it, but I have indeed some high 
 values. I know about the problems linked to the arcsine transformation. I 
 won't consider it 
 anyway. I'd like to use either the raw values of pollen grain counts or a 
 logistic quasibinomial model. 
 Best,
 Valérie
 
 
 Message du 02/02/13 à 20h47
 De : Liz Pryde 
 A : v_coudr...@voila.fr 
 Copie à : Cade Brian , r-sig-ecology@r-project.org 
 Objet : Re: [R-sig-eco] proportion data with many zeros
 
 Have you plotted the raw data to have a look at the distribution?
 You could try another exponential family distribution like tweedie that has 
 a mass at zero but is otherwise similar to poisson/gamma - so you're directly
 modeling the zeroes. It won't work if you have a lot of high values though. 
 Proportions are tricky. Have a read of the Warton paper (2012/11?) the 
 arcsine is asinine.
 
 Liz
 
 
 
 On 02/02/2013, at 6:34 PM, v_coudr...@voila.fr wrote:
 
 Thank you very much for this suggestion. In fact I reconsidered my question 
 and I am not sure that zero-inflated model is what I need. If I understood 
 it
 properly, 
 a zero-inflated model is best suited when we don't know if zero values are 
 true or false absences (right?). In my case all zero values are assumed to 
 be real 
 absence and are therefore informative. However, fitting quasipoisson on raw 
 counts or quasibinomial on proportion gives me awful distributions of 
 residuals
 and 
 meaningless results. 
 
 Valérie
 
 
 Message du 01/02/13 à 17h22
 De : Cade, Brian 
 A : v_coudr...@voila.fr
 Copie à : r-sig-ecology@r-project.org
 Objet : Re: [R-sig-eco] proportion data with many zeros
 
 For a fully parametric approach, you might want to use of zero-inflated
 beta distribution (e.g., as available in gamlss package), which is designed
 for zero-inflated proportions. Or for a semi-parametric approach, you
 could estimated a sequence of quantile regression estimates (e.g., in
 package quantreg), where some interval (hopefully not to large) of the
 quantiles will be uninformative because they are massed at the zero values.
 
 Brian
 
 Brian S. Cade, PhD
 
 U. S. Geological Survey
 Fort Collins Science Center
 2150 Centre Ave., Bldg. C
 Fort Collins, CO 80526-8818
 
 email: brian_c...@usgs.gov
 tel: 970 226-9326
 
 
 
 On Fri, Feb 1, 2013 at 1:30 AM, wrote:
 
 Dear all, I am trying to test how the proportion of pollen of different
 plants found in the brood cells of a wild bee changes over time. I
 conducted 4 sampling sessions
 (thus time is a factor with 4 levels) and collected several pollen samples
 for each time point (300 pollen grains counted for each sample). I thought
 about applying a
 quasi-binomial glm:
 
 y = cbind(total pollen - pollen of plant 

Re: [R-sig-eco] Very large dispersion parameter in a negative binomial model

2013-02-03 Thread Manuel Spínola
Thank you very much Mollie.

Best,

Manuel


2013/2/3 Mollie Brooks mbro...@ufl.edu

 Hi Manuel,
 This means that your data is closer to Poisson.
 Here is an example where I simulate Poisson data and try to fit the NB
 distribution. I get behavior from glm.nb that is similar to your results
 (large dispersion parameter and warning about iteration limit reached).

 x=rep(1:5, 10)
 y=rpois(n=length(x), lambda=exp(x*.5+1))
 dat=data.frame(x=x,y=y)
 library(MASS)
 m1=glm.nb(y~x, dat)
 m2=glm(y~x, dat, family=poisson)

 Try using glm instead of glm.nb

 cheers,
 Mollie

 Mollie Brooks
 Postdoctoral Researcher, Ponciano Lab
 Biology Department, University of Florida
 http://people.biology.ufl.edu/mbrooks


 On 3 Feb 2013, at 10:16 AM, Manuel Spínola wrote:

 Dear list members,

 I am fitting a negative binomial model but I get a very large dispersion
 parameter.  Why is that?

 quine.nb2 - glm.nb(Pt ~ Agua + Dist10, data = Abundancia)

 summary(quine.nb2)

 Deviance Residuals:
 Min1QMedian3Q   Max
 -2.20911  -0.67157   0.04411   0.35695   1.65524

 Coefficients:
  Estimate Std. Error z value Pr(|z|)
 (Intercept)   -0.883540.43982  -2.009   0.0445 *
 Agua[T.negra]  0.893190.43727   2.043   0.0411 *
 Dist10 0.394160.04387   8.984   2e-16 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 (Dispersion parameter for Negative Binomial(67552.88) family taken to be 1)

Null deviance: 128.567  on 15  degrees of freedom
 Residual deviance:  17.005  on 13  degrees of freedom
 AIC: 67.058

 Number of Fisher Scoring iterations: 1


  Theta:  67553
  Std. Err.:  1428117
 Warning while fitting theta: iteration limit reached

 2 x log-likelihood:  -59.058


 Best,

 Manuel

 --
 *Manuel Spínola, Ph.D.*

 Instituto Internacional en Conservación y Manejo de Vida Silvestre
 Universidad Nacional
 Apartado 1350-3000
 Heredia
 COSTA RICA
 mspin...@una.ac.cr
 mspinol...@gmail.com
 Teléfono: (506) 2277-3598
 Fax: (506) 2237-7036
 Personal website: Lobito de río 
 https://sites.google.com/site/lobitoderio/
 Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology





-- 
*Manuel Spínola, Ph.D.*
Instituto Internacional en Conservación y Manejo de Vida Silvestre
Universidad Nacional
Apartado 1350-3000
Heredia
COSTA RICA
mspin...@una.ac.cr
mspinol...@gmail.com
Teléfono: (506) 2277-3598
Fax: (506) 2237-7036
Personal website: Lobito de río https://sites.google.com/site/lobitoderio/
Institutional website: ICOMVIS http://www.icomvis.una.ac.cr/

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


[R-sig-eco] Adonis and Random Effects

2013-02-03 Thread Erin Nuccio
Hello List,

Is adonis capable of modeling random effects?  I'm analyzing the impact of a 
treatment on the microbial community in a split-plot design (2 treatments per 
plot, 4 plots per grassland, 3 grasslands total). I would like to quantify how 
much of the variance is due to the Treatment versus the Grassland.  It seems 
like Grassland should be a random effect, since there are thousands of 
grasslands, and I'm only looking at 3.

I have tried to use the notation that works with lme4, and it's not working for 
me (see below for formula and error messages).  If adonis can't do random 
effects, are there any alternatives?  Or, considering my goal, are there any 
other programs I should look into?  Any suggestions would be highly appreciated!

Thanks for your help,
Erin



Here's what I think I should run:
adonis(formula = community_distance_matrix ~ Treatment + (1|Grassland) + 
(1|GrasslandPlot), strata = GrasslandPlot)

Here are my factors:
'data.frame':   24 obs. of  4 variables:
 $ Treatment: Factor w/ 2 levels T1,T2: 1 1 1 1 1 2 2 2 1 1 ...
 $ Grassland: Factor w/ 3 levels G1,G2,G3: 3 3 1 1 1 2 2 1 2 2 ...
 $ Plot : Factor w/ 4 levels P1,P2,P3,P4: 1 2 2 3 4 1 3 2 1 2 
...
 $ GrasslandPlot: Factor w/ 12 levels G1:P1,G1:P2,G1:P3..: 9 10 2 3 4 5 7 
2 5 6 ...

And here's the error message:
Error in `contrasts-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
In addition: Warning messages:
1: In Ops.factor(1, Grassland) : | not meaningful for factors
2: In Ops.factor(1, GrasslandPlot) : | not meaningful for factors

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology