Re: [R-sig-eco] Multivariate ANOVA/repeated measures
On Mon, 2011-10-10 at 09:47 -0600, Dave Roberts wrote: / snip Nick, I would try something pretty direct. Any appeal to differences in dissimilarities confounds the effects with the particular dissimilarity/distance matrix you use. Assuming the samples and species are in the same order, and that the data.frames are the same size, you might try actual - sum((ST1-ST2)^2) and then permute one of the two matrices numerous times res - rep(NA,999) for (i in 1:999) { res[i] - sum((ST1-ST2[sample(1:nrow(ST2),replace=FALSE),])^2) } final - (sum(res = actual) + 1)/1000 and see what fraction of the permuted matrices are as similar. Hopefully Gavin will weigh in with a better randomization. I guess it does depend on what is exchangeable under the Null Hypothesis? I suppose that technically, we should condition the permutation on the 24 sampling locations. Under the Null, we are assuming that ST1 and ST2 are equivalent and therefore drawn from the same population of samples that doing lots of ST2 sampling would produce if it were repeated many times. However, the samples from an individual location are not necessarily exchangeable between locations; we need to respect the clustering inherent in the data. If the samples for ST1 were collected at the same time, and the ones for ST2 were also collected together at another time point, we could be very pedantic and say that all sampling locations experienced the same time process and that we have to use the same permutation in each sampling location. That would suck as then you would only have two valid permutations; the observed one and the samples in the other order. So we could perhaps relax that assumption... or give up now :-) Anyway - if you want to condition the permutation on the sampling locations in adonis(), include that as a factor via the `strata` argument and the permutation test will put the samples in random order within sampling location but allow for different ordering (ST1 then ST2, or ST2 then ST1) within each location during each permutation. To my mind this does capture the proposed Null; ST1 and ST2 are exchangeable as we are testing the hypothesis that they are from the same population, but samples are not exchangeable between sampling locations as we have repeated measures. If you do go with a multivariate approach I might try a procrustes analysis of PCO ordinations. That is a good suggestion, Dave. I would add that this will test the similarity of the configuration of points in ordination space. adonis() will test if the two sampling methods are drawn from populations with the same mean species composition. These are two different aspects of the problem. I would probably try both i) adonis() and ii) separate ordinations of the ST1 and ST2 species matrices followed by a procrustes rotation of the two configurations - see ?procrustes for the latter. both need the `strata` argument supplying to get the correct NULL. Dave G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multivariate ANOVA/repeated measures
On 10/07/2011 08:51 AM, Dr N.A. Cutler wrote: Dear All, I have a query about multivariate analysis of community data. In my experiment, 24 microbial communities in different locations were sampled using Sampling Technique 1 (ST1). A site X species matrix was then derived by molecular analysis. The same 24 locations were then sampled again using a different sampling technique (ST2) and a second site X species matrix was derived. It is assumed that community structure remains intact after sampling by Technique 1 i.e. the two techniques can sample from the same pool of organisms. I want to compare the results of the two sampling exercises in order to test the performance of the two sampling techniques. My research question is: does Technique 1 produce a similar signal to Technique 2? Or do the different techniques give significantly different pictures of community structure? The null hypotheses is that there is no significant difference between the two sampling techniques i.e. they both capture community structure with the same degree of accuracy. It occurred to be that I could use a multivariate ANOVA technique (e.g. Adonis) to distinguish between the results of the two sampling exercises, using sampling technique as a factor. But I am not sure how to deal with the obvious correlation between sample pairs. Should this situation be addressed as a repeated measures experiment with two time steps? If so, what is the best technique to use (a mixed model, perhaps?) Any advice would be gratefully received. Best wishes, Nick Cutler ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology Nick, I would try something pretty direct. Any appeal to differences in dissimilarities confounds the effects with the particular dissimilarity/distance matrix you use. Assuming the samples and species are in the same order, and that the data.frames are the same size, you might try actual - sum((ST1-ST2)^2) and then permute one of the two matrices numerous times res - rep(NA,999) for (i in 1:999) { res[i] - sum((ST1-ST2[sample(1:nrow(ST2),replace=FALSE),])^2) } final - (sum(res = actual) + 1)/1000 and see what fraction of the permuted matrices are as similar. Hopefully Gavin will weigh in with a better randomization. If you do go with a multivariate approach I might try a procrustes analysis of PCO ordinations. Dave -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multivariate ANOVA/repeated measures
On Mon, 10 Oct 2011, Dave Roberts wrote: I want to compare the results of the two sampling exercises in order to test the performance of the two sampling techniques. I would try something pretty direct. Any appeal to differences in dissimilarities confounds the effects with the particular dissimilarity/distance matrix you use. Assuming the samples and species are in the same order, and that the data.frames are the same size, you might try I did not read the original message, so I hope you'll allow me to join the thread. My recommendation is to use univariate tree models, particularly a classification tree (for ordinal explanatory variables; i.e., ST1 and ST2). This is fully, carefully, and non-technically explained in Chapter 9 (particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith Analysing Ecological Data. For that matter, I highly recommend reading the whole book. Rich ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multivariate ANOVA/repeated measures
On Mon, 2011-10-10 at 09:11 -0700, Rich Shepard wrote: On Mon, 10 Oct 2011, Dave Roberts wrote: I want to compare the results of the two sampling exercises in order to test the performance of the two sampling techniques. I would try something pretty direct. Any appeal to differences in dissimilarities confounds the effects with the particular dissimilarity/distance matrix you use. Assuming the samples and species are in the same order, and that the data.frames are the same size, you might try I did not read the original message, so I hope you'll allow me to join the thread. My recommendation is to use univariate tree models, particularly a classification tree (for ordinal explanatory variables; i.e., ST1 and ST2). But the response here is *multivariate* - of course, one could use Glen De'Ath's multivariate regression trees (despite the name it is really a constrained clustering/classification) - but I think there are better ways of solving this particular problem. And unless one has many 100s of observations, the model will need some sort of variance reduction applied (via bagging, or some such) as the one fitted model is potentially highly unstable. G This is fully, carefully, and non-technically explained in Chapter 9 (particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith Analysing Ecological Data. For that matter, I highly recommend reading the whole book. Rich ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multivariate ANOVA/repeated measures
On 10/10/2011 02:15 PM, Gavin Simpson wrote: On Mon, 2011-10-10 at 09:11 -0700, Rich Shepard wrote: On Mon, 10 Oct 2011, Dave Roberts wrote: I want to compare the results of the two sampling exercises in order to test the performance of the two sampling techniques. I would try something pretty direct. Any appeal to differences in dissimilarities confounds the effects with the particular dissimilarity/distance matrix you use. Assuming the samples and species are in the same order, and that the data.frames are the same size, you might try I did not read the original message, so I hope you'll allow me to join the thread. My recommendation is to use univariate tree models, particularly a classification tree (for ordinal explanatory variables; i.e., ST1 and ST2). But the response here is *multivariate* - of course, one could use Glen De'Ath's multivariate regression trees (despite the name it is really a constrained clustering/classification) - but I think there are better ways of solving this particular problem. And unless one has many 100s of observations, the model will need some sort of variance reduction applied (via bagging, or some such) as the one fitted model is potentially highly unstable. G This is fully, carefully, and non-technically explained in Chapter 9 (particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith Analysing Ecological Data. For that matter, I highly recommend reading the whole book. Rich ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology It would be fairly simple to boil down to a univariate question. You could do something as simple as a paired t-test of plot-level species richness or the number of individuals sampled (to compare sampling efficiency), but I still don't see an independent and a dependent variable. Dave -- David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email drobe...@montana.edu Montana State University Bozeman, MT 59717-3460 ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Multivariate ANOVA/repeated measures
On Mon, 2011-10-10 at 15:32 -0600, Dave Roberts wrote: On 10/10/2011 02:15 PM, Gavin Simpson wrote: On Mon, 2011-10-10 at 09:11 -0700, Rich Shepard wrote: On Mon, 10 Oct 2011, Dave Roberts wrote: I want to compare the results of the two sampling exercises in order to test the performance of the two sampling techniques. I would try something pretty direct. Any appeal to differences in dissimilarities confounds the effects with the particular dissimilarity/distance matrix you use. Assuming the samples and species are in the same order, and that the data.frames are the same size, you might try I did not read the original message, so I hope you'll allow me to join the thread. My recommendation is to use univariate tree models, particularly a classification tree (for ordinal explanatory variables; i.e., ST1 and ST2). But the response here is *multivariate* - of course, one could use Glen De'Ath's multivariate regression trees (despite the name it is really a constrained clustering/classification) - but I think there are better ways of solving this particular problem. And unless one has many 100s of observations, the model will need some sort of variance reduction applied (via bagging, or some such) as the one fitted model is potentially highly unstable. G This is fully, carefully, and non-technically explained in Chapter 9 (particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith Analysing Ecological Data. For that matter, I highly recommend reading the whole book. Rich ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology It would be fairly simple to boil down to a univariate question. You could do something as simple as a paired t-test of plot-level species richness or the number of individuals sampled (to compare sampling efficiency), but I still don't see an independent and a dependent variable. Dave Indeed, but species richness is quite a different comparison of the two sampling strategies. The original Q was quite broad so might be what the OP wants. The predictor would be the sampling strategy I guess - can you explain/cluster the data on the basis of sampling strategy? Compositionally, adonis() would seem to be an appropriate technique here - it would be effectively a multivariate t-test - and would test if the multivariate centroids of the species compositions in the two samples are similar or not (difference of centroids is 0). Of course, if that Null is rejected, you must test if the difference in centroids is due to a difference in multivariate dispersions (the spread of the points about the centroid), which can be done via betadisper(). The issue I'm still thinking about is the permutation - at first look, your randomisation seems appropriate, especially if the two sampling strategies can be considered random sampling methods - i.e. all sites have equal chance of being selected, /and/ that there is no underlying clustering in the population that should be respected. The Null hypothesis seems to be that ST1 and ST2 are just two random samples of some population of species composition/samples that would arise if you did lots of sampling using ST2. G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology