[R-sig-eco] vegdist: binomial, is it the scale-invariant version?

2014-03-26 Thread Pierre THIRIET

Dear all,

Please, could someone confirm that vegdist(X, meth=binomial) is the 
binomial deviance *scaled* (i.e. the scale-invarient version of 
binomial deviance)


I want to use the dissimilarity binomial deviance *scaled*, described 
by Anderson and Millar (2004) as

where

scaled since it is the scale-invarient version of the binomial 
deviance (by dividing by n/k/), also described by Anderson and Millar 
(2004)



vegdist implements the binomial index as
d[jk] = sum*(*x[ij]*log(x[ij]/n[i]) + x[ik]*log(x[ik]/n[i]) - 
n[i]*log(1/2)*)*/n[i], where n[i] = x[ij] + x[ik]


It seems therefore that is the binomial deviance scaled, whright ?

I thank you in advance,

cheers,
Pierre



Anderson, M.J. and Millar, R.B. (2004). Spatial variation and effects of 
habitat on temperate reef fish assemblages in northeastern New Zealand. 
/Journal of Experimental Marine Biology and Ecology/ 305, 191--221.

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Vegan-Adonis-NMDS-SIMPER

2014-03-26 Thread Brandon Gerig
Thanks for the words of caution on simper.

Am I completely off base in thinking that betadiver function (analgous to
Levene's test) could be used to examine variation between levels within
main effects?

Cheers


On Mon, Mar 24, 2014 at 5:08 PM, Brandon Gerig bge...@nd.edu wrote:

 I am assessing the level of similarity between PCB congener profiles in
 spawning salmon and resident stream in stream reaches with and without
 salmon to determine if salmon are a significant vector for PCBs in
 tributary foodwebs of the Great Lakes.

 My data set is arranged in a matrix where the columns represent the
 congener of interest and the rows represent either a salmon (migratory) or
 resident fish (non migratory) from different sites.  You can think of this
 in a manner analogous to columns representing species composition and rows
 representing site.

 Currently, I am using the function Adonis to test for dissimilarity
 between fish species, stream reaches (with and without salmon) and lake
 basin (Superior, Huron, Michigan).
 The model statement is:

 m1adonis(congener~FISH*REACH*BASIN,data=pcbcov,method=bray,permutations=999)

 The output indicates significant main effects of FISH, REACH, and BASIN
 and significant interactions between FISH and BASIN, and BASIN and REACH.

 Is it best to then interpret this output via an NMDS ordination plot or
 use something like the betadiver function to examine variances between main
 effect levels or both?

 Also,  can anyone recommend a procedure to identify the congeners that
 contribute most to the dissimilarity between fish, reaches, and basins?. I
 was thinking the SIMPER procedure but am not yet sold.

 Any advice is appreciated!
 --
 Brandon Gerig
 PhD Student
 Department of Biological Sciences
 University of Notre Dame




-- 
Brandon Gerig
PhD Student
Department of Biological Sciences
University of Notre Dame

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-26 Thread Gavin Simpson
1) Visually - unless it actually matters exactly on which day in the
year the peak is observed? If visually is OK, just do `plot(mod, pages
= 1)` to see the fitted splines on a single page. See `?plot.gam` for
more details on the plot method.

2) You could generate some new data to predict upon as follows:

newdat - data.frame(DoY = seq_len(366), time = mean(foo$time))

Then predict for those new data but collect the individual
contributions of the spline terms to the predicted value rather than
just the final prediction

pred - predict(mod, newdata = newdat, type = terms)

Then find the maximal value of the DoY contribution

take - which(pred$DoY == max(pred$DoY))
newdat[take, , drop = FALSE]

You could use

take - which.max(pred$DoY)

instead of the `which()` I used, but only if there is a single maximal value.

This works because the spline terms in the additive model are just
that; additive. Hence because you haven't let the DoY and time splines
interact (in the simple model I mentioned, it is more complex if you
allow these to interact as you then need to predict DoY for each years
worth of time points), you can separate DoY from the other terms.

None of the above code has been tested, but was written off top of my
head, but should work or at least get you pretty close to something
that works.

HTH

G

On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote:
 Thanks Gavin,
  This seems like a promising approach and a first pass suggests it works
 with this data. I can't quite figure out how I would go about interrogating
 the fitted spline to deterine when the peak value happens with respect to
 DoY.  Any suggestions?
 -Jacob


 On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com wrote:

 I would probably attack this using a GAM modified to model the
 residuals as a stochastic time series process.

 For example

 require(mgcv)
 mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo,
  correlation = corCAR1(form = ~ time))

 where `foo` is your data frame, `DoY` is a variable in the data frame
 computed as `as.numeric(strftime(RDate, format = %j))` and `time` is
 a variable for the passage of time - you could do `as.numeric(RDate)`
 but the number of days is probably large as we might encounter more
 problems fitting the model. Instead you might do `as.numeric(RDate) /
 1000` say to produce values on a more manageable scale. The `bs =
 cc` bit specifies a cyclic spline applicable to data measured
 throughout a year. You may want to fix the start and end knots to be
 days 1 and days 366 respectively, say via `knots = list(DoY =
 c(0,366))` as an argument to `gam()` [I think I have this right,
 specifying the boundary knots, but let me know if you get an error
 about the number of knots]. The residuals are said to follow a
 continuois time AR(1), the irregular-spaced counter part to the AR(1),
 plus random noise.

 There may be identifiability issues as the `s(time)` and `corCAR1()`
 compete to explain the fine-scale variation. If you hit such a case,
 you can make an educated guess as to the wiggliness (degrees of
 freedom) for the smooth terms based on a plot of the data and fix the
 splines at those values via argument `k = x` and `fx = TRUE`, where
 `x` in `k = x` is some integer value. Both these go in as arguments to
 the `s()` functions. If the trend is not very non-linear you can use a
 low value 1-3 here for x and for the DoY term say 3-4 might be
 applicable.

 There are other ways to approach this problem of identifiability, but
 that would require more time/space here, which I can go into via a
 follow-up if needed.

 You can interrogate the fitted splines to see when the peak value of
 the `DoY` term is in the year.

 You can also allow the seasonal signal to vary in time with the trend
 by allowing the splines to interact in a 2d-tensor product spline.
 Using `te(DoY, time, bs = c(cc,cr))` instead of the two `s()`
 terms (or using `ti()` terms for the two marginal splines and the
 2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the
 `te()` term where `x` and `y` are the dfs for each dimension in the
 `te()` term. It is a bit more complex to do this for `ti()` terms.

 Part of the reason to prefer a spline for DoY for the seasonal term is
 that one might not expect the seasonal cycle to be a symmetric cycle
 as a cos/sin terms would imply.

 A recent ecological paper describing a similar approach (though using
 different package in R) is that of Claire Ferguson and colleagues in J
 Applied Ecology (2008) http://doi.org/10./j.1365-2664.2007.01428.x
 (freely available).

 HTH

 G

 On 25 March 2014 19:14, Jacob Cram cramj...@gmail.com wrote:
  Hello all,
   I am thinking about applying season::cosinor() analysis to some
  irregularely spaced time series data. The data are unevenly spaced, so
  usual time series methods, as well as the nscosinor() function are out.
  My
  data do however trend over time 

Re: [R-sig-eco] Vegan-Adonis-NMDS-SIMPER

2014-03-26 Thread Gavin Simpson
You mean `betadisper()`? This simply computes a multivariate
dispersion about the kth group centroid for k groups. If you can
express the levels within main effects as a factor variable defining
the groups then `betadisper()` could work with that, but I'm not quite
following what you want to do.

`adonis()` will test whether the groups means (defined by the
combinations of the levels of the covariate factors) differ.
`betadisper()` can test if there are different variances for the
same groups. If there are different variances, one might question the
results from `adonis()` if it indicated that the observed group means
was inconsistent with the hypothesis of equal group means. This
inconsistency may be due solely or in part to the heterogeneity of
dispersions (variances).

Is that what you want to test/investigate?

G

On 26 March 2014 09:57, Brandon Gerig bge...@nd.edu wrote:
 Thanks for the words of caution on simper.

 Am I completely off base in thinking that betadiver function (analgous to
 Levene's test) could be used to examine variation between levels within
 main effects?

 Cheers


 On Mon, Mar 24, 2014 at 5:08 PM, Brandon Gerig bge...@nd.edu wrote:

 I am assessing the level of similarity between PCB congener profiles in
 spawning salmon and resident stream in stream reaches with and without
 salmon to determine if salmon are a significant vector for PCBs in
 tributary foodwebs of the Great Lakes.

 My data set is arranged in a matrix where the columns represent the
 congener of interest and the rows represent either a salmon (migratory) or
 resident fish (non migratory) from different sites.  You can think of this
 in a manner analogous to columns representing species composition and rows
 representing site.

 Currently, I am using the function Adonis to test for dissimilarity
 between fish species, stream reaches (with and without salmon) and lake
 basin (Superior, Huron, Michigan).
 The model statement is:

 m1adonis(congener~FISH*REACH*BASIN,data=pcbcov,method=bray,permutations=999)

 The output indicates significant main effects of FISH, REACH, and BASIN
 and significant interactions between FISH and BASIN, and BASIN and REACH.

 Is it best to then interpret this output via an NMDS ordination plot or
 use something like the betadiver function to examine variances between main
 effect levels or both?

 Also,  can anyone recommend a procedure to identify the congeners that
 contribute most to the dissimilarity between fish, reaches, and basins?. I
 was thinking the SIMPER procedure but am not yet sold.

 Any advice is appreciated!
 --
 Brandon Gerig
 PhD Student
 Department of Biological Sciences
 University of Notre Dame




 --
 Brandon Gerig
 PhD Student
 Department of Biological Sciences
 University of Notre Dame

 [[alternative HTML version deleted]]

 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology



-- 
Gavin Simpson, PhD

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-26 Thread Jacob Cram
Thanks again Gavin, this works.
gamm() also models the long term trend with a spline s(Time), which is
great. I would still like though, to be able to say whether the factor is
trending up or down over time.  Would it be fair to query
summary(mod$lme)$tTable
and to look at the p-value and Value corresponding Xs(time)Fx1 value to
identify such a trend?

Also, here are a few syntax corrections on the code provided in the last
email:
1)visual appoach
plot(mod$gam, pages = 1)

2) quantitative approach
pred - predict(mod$gam, newdata = newdat, type = terms)
take - which(pred[,s(DoY)] == max(pred[,s(DoY)]))
or
take - as.numeric(which.max(pred[,s(DoY)]))

Cheers,
-Jacob


On Wed, Mar 26, 2014 at 9:46 AM, Gavin Simpson ucfa...@gmail.com wrote:

 1) Visually - unless it actually matters exactly on which day in the
 year the peak is observed? If visually is OK, just do `plot(mod, pages
 = 1)` to see the fitted splines on a single page. See `?plot.gam` for
 more details on the plot method.

 2) You could generate some new data to predict upon as follows:

 newdat - data.frame(DoY = seq_len(366), time = mean(foo$time))

 Then predict for those new data but collect the individual
 contributions of the spline terms to the predicted value rather than
 just the final prediction

 pred - predict(mod, newdata = newdat, type = terms)

 Then find the maximal value of the DoY contribution

 take - which(pred$DoY == max(pred$DoY))
 newdat[take, , drop = FALSE]

 You could use

 take - which.max(pred$DoY)

 instead of the `which()` I used, but only if there is a single maximal
 value.

 This works because the spline terms in the additive model are just
 that; additive. Hence because you haven't let the DoY and time splines
 interact (in the simple model I mentioned, it is more complex if you
 allow these to interact as you then need to predict DoY for each years
 worth of time points), you can separate DoY from the other terms.

 None of the above code has been tested, but was written off top of my
 head, but should work or at least get you pretty close to something
 that works.

 HTH

 G

 On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote:
  Thanks Gavin,
   This seems like a promising approach and a first pass suggests it
 works
  with this data. I can't quite figure out how I would go about
 interrogating
  the fitted spline to deterine when the peak value happens with respect to
  DoY.  Any suggestions?
  -Jacob
 
 
  On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com
 wrote:
 
  I would probably attack this using a GAM modified to model the
  residuals as a stochastic time series process.
 
  For example
 
  require(mgcv)
  mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo,
   correlation = corCAR1(form = ~ time))
 
  where `foo` is your data frame, `DoY` is a variable in the data frame
  computed as `as.numeric(strftime(RDate, format = %j))` and `time` is
  a variable for the passage of time - you could do `as.numeric(RDate)`
  but the number of days is probably large as we might encounter more
  problems fitting the model. Instead you might do `as.numeric(RDate) /
  1000` say to produce values on a more manageable scale. The `bs =
  cc` bit specifies a cyclic spline applicable to data measured
  throughout a year. You may want to fix the start and end knots to be
  days 1 and days 366 respectively, say via `knots = list(DoY =
  c(0,366))` as an argument to `gam()` [I think I have this right,
  specifying the boundary knots, but let me know if you get an error
  about the number of knots]. The residuals are said to follow a
  continuois time AR(1), the irregular-spaced counter part to the AR(1),
  plus random noise.
 
  There may be identifiability issues as the `s(time)` and `corCAR1()`
  compete to explain the fine-scale variation. If you hit such a case,
  you can make an educated guess as to the wiggliness (degrees of
  freedom) for the smooth terms based on a plot of the data and fix the
  splines at those values via argument `k = x` and `fx = TRUE`, where
  `x` in `k = x` is some integer value. Both these go in as arguments to
  the `s()` functions. If the trend is not very non-linear you can use a
  low value 1-3 here for x and for the DoY term say 3-4 might be
  applicable.
 
  There are other ways to approach this problem of identifiability, but
  that would require more time/space here, which I can go into via a
  follow-up if needed.
 
  You can interrogate the fitted splines to see when the peak value of
  the `DoY` term is in the year.
 
  You can also allow the seasonal signal to vary in time with the trend
  by allowing the splines to interact in a 2d-tensor product spline.
  Using `te(DoY, time, bs = c(cc,cr))` instead of the two `s()`
  terms (or using `ti()` terms for the two marginal splines and the
  2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the
  `te()` term where `x` and `y` are the dfs for each 

Re: [R-sig-eco] Vegan-Adonis-NMDS-SIMPER

2014-03-26 Thread Steve Brewer
Brandon,

Are you asking if you can use betadisper as a substitute for post-anova
pairwise comparisons among levels? After using betadisper to obtain
dispersions, I believe you can plot the centroids for each level. In
addition to telling you if the dispersions differ among levels, you could
see how the centroids differ from one another. Is this what you want to
know? If so, realize that it won't give you pairwise significance tests
for differences between levels. For that, you might want to do additional
permanovas on reduced datasets containing only the two levels you want to
compare. You could then adjust the p-values for multiple tests after the
fact.

Hope this helps,

Steve


J. Stephen Brewer 
Professor 
Department of Biology
PO Box 1848
 University of Mississippi
University, Mississippi 38677-1848
 Brewer web page - http://home.olemiss.edu/~jbrewer/
FAX - 662-915-5144
Phone - 662-915-1077




On 3/26/14 10:57 AM, Brandon Gerig bge...@nd.edu wrote:

Thanks for the words of caution on simper.

Am I completely off base in thinking that betadiver function (analgous to
Levene's test) could be used to examine variation between levels within
main effects?

Cheers


On Mon, Mar 24, 2014 at 5:08 PM, Brandon Gerig bge...@nd.edu wrote:

 I am assessing the level of similarity between PCB congener profiles in
 spawning salmon and resident stream in stream reaches with and without
 salmon to determine if salmon are a significant vector for PCBs in
 tributary foodwebs of the Great Lakes.

 My data set is arranged in a matrix where the columns represent the
 congener of interest and the rows represent either a salmon (migratory)
or
 resident fish (non migratory) from different sites.  You can think of
this
 in a manner analogous to columns representing species composition and
rows
 representing site.

 Currently, I am using the function Adonis to test for dissimilarity
 between fish species, stream reaches (with and without salmon) and lake
 basin (Superior, Huron, Michigan).
 The model statement is:

 
m1adonis(congener~FISH*REACH*BASIN,data=pcbcov,method=bray,permutation
s=999)

 The output indicates significant main effects of FISH, REACH, and BASIN
 and significant interactions between FISH and BASIN, and BASIN and
REACH.

 Is it best to then interpret this output via an NMDS ordination plot or
 use something like the betadiver function to examine variances between
main
 effect levels or both?

 Also,  can anyone recommend a procedure to identify the congeners that
 contribute most to the dissimilarity between fish, reaches, and
basins?. I
 was thinking the SIMPER procedure but am not yet sold.

 Any advice is appreciated!
 --
 Brandon Gerig
 PhD Student
 Department of Biological Sciences
 University of Notre Dame




-- 
Brandon Gerig
PhD Student
Department of Biological Sciences
University of Notre Dame

   [[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Cosinor with data that trend over time

2014-03-26 Thread Gavin Simpson
Sorry about the errors (typos, not syntax errors) - I was forgetting
that you'd need to use `gamm()` and hence access the `$gam` component

I don't follow the point about a factor trending up or down. You
shouldn't try to use the `$lme` part of the model for this.
`summary(mod$gam)` should be sufficient, but as it relates to a
spline, this is more a test of whether the spline is different from a
horizontal, flat, null line. The problem with splines is that the
trend need not just be trending up or down. In the past, to convey
where change in the trend occurs I have used the first derivative of
the fitted spline and looked for where in time the 95% confidence
interval on the first derivative of the spline doesn't include zero;
that shows the regions in time where the trend is significantly
increasing or decreasing. I cover how to do this in a blog post I
wrote:

http://www.fromthebottomoftheheap.net/2011/06/12/additive-modelling-and-the-hadcrut3v-global-mean-temperature-series/

the post contains links to the R code used for the derivatives etc,
though it is a little more complex in the case of a model with a trend
spline and seasonal spline.

I'm supposed to have updated those codes and the post because several
people have asked me how I do the analysis for models with multiple
spline terms. If you can't get the code to work for your models, ping
me back and I'll try to move that to the top of my TO DO list.

Note the `Xs(time)Fx1` entries in the `summary(mod$lme)` table refer
to the basis functions that represent the spline or at least to some
part of those basis functions. You can't really make much practical
use out of those values are they relate specifically to way the
penalised regression spline model has been converted into an
equivalent linear mixed effect form.

HTH

G

On 26 March 2014 12:10, Jacob Cram cramj...@gmail.com wrote:
 Thanks again Gavin, this works.
 gamm() also models the long term trend with a spline s(Time), which is
 great. I would still like though, to be able to say whether the factor is
 trending up or down over time.  Would it be fair to query
 summary(mod$lme)$tTable
 and to look at the p-value and Value corresponding Xs(time)Fx1 value to
 identify such a trend?

 Also, here are a few syntax corrections on the code provided in the last
 email:
 1)visual appoach
 plot(mod$gam, pages = 1)

 2) quantitative approach
 pred - predict(mod$gam, newdata = newdat, type = terms)
 take - which(pred[,s(DoY)] == max(pred[,s(DoY)]))
 or
 take - as.numeric(which.max(pred[,s(DoY)]))

 Cheers,
 -Jacob


 On Wed, Mar 26, 2014 at 9:46 AM, Gavin Simpson ucfa...@gmail.com wrote:

 1) Visually - unless it actually matters exactly on which day in the
 year the peak is observed? If visually is OK, just do `plot(mod, pages
 = 1)` to see the fitted splines on a single page. See `?plot.gam` for
 more details on the plot method.

 2) You could generate some new data to predict upon as follows:

 newdat - data.frame(DoY = seq_len(366), time = mean(foo$time))

 Then predict for those new data but collect the individual
 contributions of the spline terms to the predicted value rather than
 just the final prediction

 pred - predict(mod, newdata = newdat, type = terms)

 Then find the maximal value of the DoY contribution

 take - which(pred$DoY == max(pred$DoY))
 newdat[take, , drop = FALSE]

 You could use

 take - which.max(pred$DoY)

 instead of the `which()` I used, but only if there is a single maximal
 value.

 This works because the spline terms in the additive model are just
 that; additive. Hence because you haven't let the DoY and time splines
 interact (in the simple model I mentioned, it is more complex if you
 allow these to interact as you then need to predict DoY for each years
 worth of time points), you can separate DoY from the other terms.

 None of the above code has been tested, but was written off top of my
 head, but should work or at least get you pretty close to something
 that works.

 HTH

 G

 On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote:
  Thanks Gavin,
   This seems like a promising approach and a first pass suggests it
  works
  with this data. I can't quite figure out how I would go about
  interrogating
  the fitted spline to deterine when the peak value happens with respect
  to
  DoY.  Any suggestions?
  -Jacob
 
 
  On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com
  wrote:
 
  I would probably attack this using a GAM modified to model the
  residuals as a stochastic time series process.
 
  For example
 
  require(mgcv)
  mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo,
   correlation = corCAR1(form = ~ time))
 
  where `foo` is your data frame, `DoY` is a variable in the data frame
  computed as `as.numeric(strftime(RDate, format = %j))` and `time` is
  a variable for the passage of time - you could do `as.numeric(RDate)`
  but the number of days is probably large as we might encounter