[R-sig-eco] vegdist: binomial, is it the scale-invariant version?
Dear all, Please, could someone confirm that vegdist(X, meth=binomial) is the binomial deviance *scaled* (i.e. the scale-invarient version of binomial deviance) I want to use the dissimilarity binomial deviance *scaled*, described by Anderson and Millar (2004) as where scaled since it is the scale-invarient version of the binomial deviance (by dividing by n/k/), also described by Anderson and Millar (2004) vegdist implements the binomial index as d[jk] = sum*(*x[ij]*log(x[ij]/n[i]) + x[ik]*log(x[ik]/n[i]) - n[i]*log(1/2)*)*/n[i], where n[i] = x[ij] + x[ik] It seems therefore that is the binomial deviance scaled, whright ? I thank you in advance, cheers, Pierre Anderson, M.J. and Millar, R.B. (2004). Spatial variation and effects of habitat on temperate reef fish assemblages in northeastern New Zealand. /Journal of Experimental Marine Biology and Ecology/ 305, 191--221. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Vegan-Adonis-NMDS-SIMPER
Thanks for the words of caution on simper. Am I completely off base in thinking that betadiver function (analgous to Levene's test) could be used to examine variation between levels within main effects? Cheers On Mon, Mar 24, 2014 at 5:08 PM, Brandon Gerig bge...@nd.edu wrote: I am assessing the level of similarity between PCB congener profiles in spawning salmon and resident stream in stream reaches with and without salmon to determine if salmon are a significant vector for PCBs in tributary foodwebs of the Great Lakes. My data set is arranged in a matrix where the columns represent the congener of interest and the rows represent either a salmon (migratory) or resident fish (non migratory) from different sites. You can think of this in a manner analogous to columns representing species composition and rows representing site. Currently, I am using the function Adonis to test for dissimilarity between fish species, stream reaches (with and without salmon) and lake basin (Superior, Huron, Michigan). The model statement is: m1adonis(congener~FISH*REACH*BASIN,data=pcbcov,method=bray,permutations=999) The output indicates significant main effects of FISH, REACH, and BASIN and significant interactions between FISH and BASIN, and BASIN and REACH. Is it best to then interpret this output via an NMDS ordination plot or use something like the betadiver function to examine variances between main effect levels or both? Also, can anyone recommend a procedure to identify the congeners that contribute most to the dissimilarity between fish, reaches, and basins?. I was thinking the SIMPER procedure but am not yet sold. Any advice is appreciated! -- Brandon Gerig PhD Student Department of Biological Sciences University of Notre Dame -- Brandon Gerig PhD Student Department of Biological Sciences University of Notre Dame [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Cosinor with data that trend over time
1) Visually - unless it actually matters exactly on which day in the year the peak is observed? If visually is OK, just do `plot(mod, pages = 1)` to see the fitted splines on a single page. See `?plot.gam` for more details on the plot method. 2) You could generate some new data to predict upon as follows: newdat - data.frame(DoY = seq_len(366), time = mean(foo$time)) Then predict for those new data but collect the individual contributions of the spline terms to the predicted value rather than just the final prediction pred - predict(mod, newdata = newdat, type = terms) Then find the maximal value of the DoY contribution take - which(pred$DoY == max(pred$DoY)) newdat[take, , drop = FALSE] You could use take - which.max(pred$DoY) instead of the `which()` I used, but only if there is a single maximal value. This works because the spline terms in the additive model are just that; additive. Hence because you haven't let the DoY and time splines interact (in the simple model I mentioned, it is more complex if you allow these to interact as you then need to predict DoY for each years worth of time points), you can separate DoY from the other terms. None of the above code has been tested, but was written off top of my head, but should work or at least get you pretty close to something that works. HTH G On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote: Thanks Gavin, This seems like a promising approach and a first pass suggests it works with this data. I can't quite figure out how I would go about interrogating the fitted spline to deterine when the peak value happens with respect to DoY. Any suggestions? -Jacob On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com wrote: I would probably attack this using a GAM modified to model the residuals as a stochastic time series process. For example require(mgcv) mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo, correlation = corCAR1(form = ~ time)) where `foo` is your data frame, `DoY` is a variable in the data frame computed as `as.numeric(strftime(RDate, format = %j))` and `time` is a variable for the passage of time - you could do `as.numeric(RDate)` but the number of days is probably large as we might encounter more problems fitting the model. Instead you might do `as.numeric(RDate) / 1000` say to produce values on a more manageable scale. The `bs = cc` bit specifies a cyclic spline applicable to data measured throughout a year. You may want to fix the start and end knots to be days 1 and days 366 respectively, say via `knots = list(DoY = c(0,366))` as an argument to `gam()` [I think I have this right, specifying the boundary knots, but let me know if you get an error about the number of knots]. The residuals are said to follow a continuois time AR(1), the irregular-spaced counter part to the AR(1), plus random noise. There may be identifiability issues as the `s(time)` and `corCAR1()` compete to explain the fine-scale variation. If you hit such a case, you can make an educated guess as to the wiggliness (degrees of freedom) for the smooth terms based on a plot of the data and fix the splines at those values via argument `k = x` and `fx = TRUE`, where `x` in `k = x` is some integer value. Both these go in as arguments to the `s()` functions. If the trend is not very non-linear you can use a low value 1-3 here for x and for the DoY term say 3-4 might be applicable. There are other ways to approach this problem of identifiability, but that would require more time/space here, which I can go into via a follow-up if needed. You can interrogate the fitted splines to see when the peak value of the `DoY` term is in the year. You can also allow the seasonal signal to vary in time with the trend by allowing the splines to interact in a 2d-tensor product spline. Using `te(DoY, time, bs = c(cc,cr))` instead of the two `s()` terms (or using `ti()` terms for the two marginal splines and the 2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the `te()` term where `x` and `y` are the dfs for each dimension in the `te()` term. It is a bit more complex to do this for `ti()` terms. Part of the reason to prefer a spline for DoY for the seasonal term is that one might not expect the seasonal cycle to be a symmetric cycle as a cos/sin terms would imply. A recent ecological paper describing a similar approach (though using different package in R) is that of Claire Ferguson and colleagues in J Applied Ecology (2008) http://doi.org/10./j.1365-2664.2007.01428.x (freely available). HTH G On 25 March 2014 19:14, Jacob Cram cramj...@gmail.com wrote: Hello all, I am thinking about applying season::cosinor() analysis to some irregularely spaced time series data. The data are unevenly spaced, so usual time series methods, as well as the nscosinor() function are out. My data do however trend over time
Re: [R-sig-eco] Vegan-Adonis-NMDS-SIMPER
You mean `betadisper()`? This simply computes a multivariate dispersion about the kth group centroid for k groups. If you can express the levels within main effects as a factor variable defining the groups then `betadisper()` could work with that, but I'm not quite following what you want to do. `adonis()` will test whether the groups means (defined by the combinations of the levels of the covariate factors) differ. `betadisper()` can test if there are different variances for the same groups. If there are different variances, one might question the results from `adonis()` if it indicated that the observed group means was inconsistent with the hypothesis of equal group means. This inconsistency may be due solely or in part to the heterogeneity of dispersions (variances). Is that what you want to test/investigate? G On 26 March 2014 09:57, Brandon Gerig bge...@nd.edu wrote: Thanks for the words of caution on simper. Am I completely off base in thinking that betadiver function (analgous to Levene's test) could be used to examine variation between levels within main effects? Cheers On Mon, Mar 24, 2014 at 5:08 PM, Brandon Gerig bge...@nd.edu wrote: I am assessing the level of similarity between PCB congener profiles in spawning salmon and resident stream in stream reaches with and without salmon to determine if salmon are a significant vector for PCBs in tributary foodwebs of the Great Lakes. My data set is arranged in a matrix where the columns represent the congener of interest and the rows represent either a salmon (migratory) or resident fish (non migratory) from different sites. You can think of this in a manner analogous to columns representing species composition and rows representing site. Currently, I am using the function Adonis to test for dissimilarity between fish species, stream reaches (with and without salmon) and lake basin (Superior, Huron, Michigan). The model statement is: m1adonis(congener~FISH*REACH*BASIN,data=pcbcov,method=bray,permutations=999) The output indicates significant main effects of FISH, REACH, and BASIN and significant interactions between FISH and BASIN, and BASIN and REACH. Is it best to then interpret this output via an NMDS ordination plot or use something like the betadiver function to examine variances between main effect levels or both? Also, can anyone recommend a procedure to identify the congeners that contribute most to the dissimilarity between fish, reaches, and basins?. I was thinking the SIMPER procedure but am not yet sold. Any advice is appreciated! -- Brandon Gerig PhD Student Department of Biological Sciences University of Notre Dame -- Brandon Gerig PhD Student Department of Biological Sciences University of Notre Dame [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- Gavin Simpson, PhD ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Cosinor with data that trend over time
Thanks again Gavin, this works. gamm() also models the long term trend with a spline s(Time), which is great. I would still like though, to be able to say whether the factor is trending up or down over time. Would it be fair to query summary(mod$lme)$tTable and to look at the p-value and Value corresponding Xs(time)Fx1 value to identify such a trend? Also, here are a few syntax corrections on the code provided in the last email: 1)visual appoach plot(mod$gam, pages = 1) 2) quantitative approach pred - predict(mod$gam, newdata = newdat, type = terms) take - which(pred[,s(DoY)] == max(pred[,s(DoY)])) or take - as.numeric(which.max(pred[,s(DoY)])) Cheers, -Jacob On Wed, Mar 26, 2014 at 9:46 AM, Gavin Simpson ucfa...@gmail.com wrote: 1) Visually - unless it actually matters exactly on which day in the year the peak is observed? If visually is OK, just do `plot(mod, pages = 1)` to see the fitted splines on a single page. See `?plot.gam` for more details on the plot method. 2) You could generate some new data to predict upon as follows: newdat - data.frame(DoY = seq_len(366), time = mean(foo$time)) Then predict for those new data but collect the individual contributions of the spline terms to the predicted value rather than just the final prediction pred - predict(mod, newdata = newdat, type = terms) Then find the maximal value of the DoY contribution take - which(pred$DoY == max(pred$DoY)) newdat[take, , drop = FALSE] You could use take - which.max(pred$DoY) instead of the `which()` I used, but only if there is a single maximal value. This works because the spline terms in the additive model are just that; additive. Hence because you haven't let the DoY and time splines interact (in the simple model I mentioned, it is more complex if you allow these to interact as you then need to predict DoY for each years worth of time points), you can separate DoY from the other terms. None of the above code has been tested, but was written off top of my head, but should work or at least get you pretty close to something that works. HTH G On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote: Thanks Gavin, This seems like a promising approach and a first pass suggests it works with this data. I can't quite figure out how I would go about interrogating the fitted spline to deterine when the peak value happens with respect to DoY. Any suggestions? -Jacob On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com wrote: I would probably attack this using a GAM modified to model the residuals as a stochastic time series process. For example require(mgcv) mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo, correlation = corCAR1(form = ~ time)) where `foo` is your data frame, `DoY` is a variable in the data frame computed as `as.numeric(strftime(RDate, format = %j))` and `time` is a variable for the passage of time - you could do `as.numeric(RDate)` but the number of days is probably large as we might encounter more problems fitting the model. Instead you might do `as.numeric(RDate) / 1000` say to produce values on a more manageable scale. The `bs = cc` bit specifies a cyclic spline applicable to data measured throughout a year. You may want to fix the start and end knots to be days 1 and days 366 respectively, say via `knots = list(DoY = c(0,366))` as an argument to `gam()` [I think I have this right, specifying the boundary knots, but let me know if you get an error about the number of knots]. The residuals are said to follow a continuois time AR(1), the irregular-spaced counter part to the AR(1), plus random noise. There may be identifiability issues as the `s(time)` and `corCAR1()` compete to explain the fine-scale variation. If you hit such a case, you can make an educated guess as to the wiggliness (degrees of freedom) for the smooth terms based on a plot of the data and fix the splines at those values via argument `k = x` and `fx = TRUE`, where `x` in `k = x` is some integer value. Both these go in as arguments to the `s()` functions. If the trend is not very non-linear you can use a low value 1-3 here for x and for the DoY term say 3-4 might be applicable. There are other ways to approach this problem of identifiability, but that would require more time/space here, which I can go into via a follow-up if needed. You can interrogate the fitted splines to see when the peak value of the `DoY` term is in the year. You can also allow the seasonal signal to vary in time with the trend by allowing the splines to interact in a 2d-tensor product spline. Using `te(DoY, time, bs = c(cc,cr))` instead of the two `s()` terms (or using `ti()` terms for the two marginal splines and the 2-d spline). Again you can add in the `k` = c(x,y), fx = TRUE)` to the `te()` term where `x` and `y` are the dfs for each
Re: [R-sig-eco] Vegan-Adonis-NMDS-SIMPER
Brandon, Are you asking if you can use betadisper as a substitute for post-anova pairwise comparisons among levels? After using betadisper to obtain dispersions, I believe you can plot the centroids for each level. In addition to telling you if the dispersions differ among levels, you could see how the centroids differ from one another. Is this what you want to know? If so, realize that it won't give you pairwise significance tests for differences between levels. For that, you might want to do additional permanovas on reduced datasets containing only the two levels you want to compare. You could then adjust the p-values for multiple tests after the fact. Hope this helps, Steve J. Stephen Brewer Professor Department of Biology PO Box 1848 University of Mississippi University, Mississippi 38677-1848 Brewer web page - http://home.olemiss.edu/~jbrewer/ FAX - 662-915-5144 Phone - 662-915-1077 On 3/26/14 10:57 AM, Brandon Gerig bge...@nd.edu wrote: Thanks for the words of caution on simper. Am I completely off base in thinking that betadiver function (analgous to Levene's test) could be used to examine variation between levels within main effects? Cheers On Mon, Mar 24, 2014 at 5:08 PM, Brandon Gerig bge...@nd.edu wrote: I am assessing the level of similarity between PCB congener profiles in spawning salmon and resident stream in stream reaches with and without salmon to determine if salmon are a significant vector for PCBs in tributary foodwebs of the Great Lakes. My data set is arranged in a matrix where the columns represent the congener of interest and the rows represent either a salmon (migratory) or resident fish (non migratory) from different sites. You can think of this in a manner analogous to columns representing species composition and rows representing site. Currently, I am using the function Adonis to test for dissimilarity between fish species, stream reaches (with and without salmon) and lake basin (Superior, Huron, Michigan). The model statement is: m1adonis(congener~FISH*REACH*BASIN,data=pcbcov,method=bray,permutation s=999) The output indicates significant main effects of FISH, REACH, and BASIN and significant interactions between FISH and BASIN, and BASIN and REACH. Is it best to then interpret this output via an NMDS ordination plot or use something like the betadiver function to examine variances between main effect levels or both? Also, can anyone recommend a procedure to identify the congeners that contribute most to the dissimilarity between fish, reaches, and basins?. I was thinking the SIMPER procedure but am not yet sold. Any advice is appreciated! -- Brandon Gerig PhD Student Department of Biological Sciences University of Notre Dame -- Brandon Gerig PhD Student Department of Biological Sciences University of Notre Dame [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Cosinor with data that trend over time
Sorry about the errors (typos, not syntax errors) - I was forgetting that you'd need to use `gamm()` and hence access the `$gam` component I don't follow the point about a factor trending up or down. You shouldn't try to use the `$lme` part of the model for this. `summary(mod$gam)` should be sufficient, but as it relates to a spline, this is more a test of whether the spline is different from a horizontal, flat, null line. The problem with splines is that the trend need not just be trending up or down. In the past, to convey where change in the trend occurs I have used the first derivative of the fitted spline and looked for where in time the 95% confidence interval on the first derivative of the spline doesn't include zero; that shows the regions in time where the trend is significantly increasing or decreasing. I cover how to do this in a blog post I wrote: http://www.fromthebottomoftheheap.net/2011/06/12/additive-modelling-and-the-hadcrut3v-global-mean-temperature-series/ the post contains links to the R code used for the derivatives etc, though it is a little more complex in the case of a model with a trend spline and seasonal spline. I'm supposed to have updated those codes and the post because several people have asked me how I do the analysis for models with multiple spline terms. If you can't get the code to work for your models, ping me back and I'll try to move that to the top of my TO DO list. Note the `Xs(time)Fx1` entries in the `summary(mod$lme)` table refer to the basis functions that represent the spline or at least to some part of those basis functions. You can't really make much practical use out of those values are they relate specifically to way the penalised regression spline model has been converted into an equivalent linear mixed effect form. HTH G On 26 March 2014 12:10, Jacob Cram cramj...@gmail.com wrote: Thanks again Gavin, this works. gamm() also models the long term trend with a spline s(Time), which is great. I would still like though, to be able to say whether the factor is trending up or down over time. Would it be fair to query summary(mod$lme)$tTable and to look at the p-value and Value corresponding Xs(time)Fx1 value to identify such a trend? Also, here are a few syntax corrections on the code provided in the last email: 1)visual appoach plot(mod$gam, pages = 1) 2) quantitative approach pred - predict(mod$gam, newdata = newdat, type = terms) take - which(pred[,s(DoY)] == max(pred[,s(DoY)])) or take - as.numeric(which.max(pred[,s(DoY)])) Cheers, -Jacob On Wed, Mar 26, 2014 at 9:46 AM, Gavin Simpson ucfa...@gmail.com wrote: 1) Visually - unless it actually matters exactly on which day in the year the peak is observed? If visually is OK, just do `plot(mod, pages = 1)` to see the fitted splines on a single page. See `?plot.gam` for more details on the plot method. 2) You could generate some new data to predict upon as follows: newdat - data.frame(DoY = seq_len(366), time = mean(foo$time)) Then predict for those new data but collect the individual contributions of the spline terms to the predicted value rather than just the final prediction pred - predict(mod, newdata = newdat, type = terms) Then find the maximal value of the DoY contribution take - which(pred$DoY == max(pred$DoY)) newdat[take, , drop = FALSE] You could use take - which.max(pred$DoY) instead of the `which()` I used, but only if there is a single maximal value. This works because the spline terms in the additive model are just that; additive. Hence because you haven't let the DoY and time splines interact (in the simple model I mentioned, it is more complex if you allow these to interact as you then need to predict DoY for each years worth of time points), you can separate DoY from the other terms. None of the above code has been tested, but was written off top of my head, but should work or at least get you pretty close to something that works. HTH G On 26 March 2014 10:02, Jacob Cram cramj...@gmail.com wrote: Thanks Gavin, This seems like a promising approach and a first pass suggests it works with this data. I can't quite figure out how I would go about interrogating the fitted spline to deterine when the peak value happens with respect to DoY. Any suggestions? -Jacob On Tue, Mar 25, 2014 at 9:06 PM, Gavin Simpson ucfa...@gmail.com wrote: I would probably attack this using a GAM modified to model the residuals as a stochastic time series process. For example require(mgcv) mod - gamm(y ~ s(DoY, bs = cc) + s(time), data = foo, correlation = corCAR1(form = ~ time)) where `foo` is your data frame, `DoY` is a variable in the data frame computed as `as.numeric(strftime(RDate, format = %j))` and `time` is a variable for the passage of time - you could do `as.numeric(RDate)` but the number of days is probably large as we might encounter