Re: [R-sig-eco] Multivariate ANOVA/repeated measures

2011-10-11 Thread Gavin Simpson
On Mon, 2011-10-10 at 09:47 -0600, Dave Roberts wrote:
/ snip
 
 Nick,
 
 I would try something pretty direct.  Any appeal to differences in 
 dissimilarities confounds the effects with the particular 
 dissimilarity/distance matrix you use.  Assuming the samples and species 
 are in the same order, and that the data.frames are the same size, you 
 might try
 
   actual - sum((ST1-ST2)^2)
 
 and then permute one of the two matrices numerous times
 
 res - rep(NA,999)
 for (i in 1:999) {
   res[i] - sum((ST1-ST2[sample(1:nrow(ST2),replace=FALSE),])^2)
 }
 final - (sum(res = actual) + 1)/1000
 
 and see what fraction of the permuted matrices are as similar.
 
 Hopefully Gavin will weigh in with a better randomization.

I guess it does depend on what is exchangeable under the Null
Hypothesis? I suppose that technically, we should condition the
permutation on the 24 sampling locations. Under the Null, we are
assuming that ST1 and ST2 are equivalent and therefore drawn from the
same population of samples that doing lots of ST2 sampling would produce
if it were repeated many times. However, the samples from an individual
location are not necessarily exchangeable between locations; we need to
respect the clustering inherent in the data.

If the samples for ST1 were collected at the same time, and the ones for
ST2 were also collected together at another time point, we could be very
pedantic and say that all sampling locations experienced the same
time process and that we have to use the same permutation in each
sampling location. That would suck as then you would only have two valid
permutations; the observed one and the samples in the other order. So we
could perhaps relax that assumption... or give up now :-)

Anyway - if you want to condition the permutation on the sampling
locations in adonis(), include that as a factor via the `strata`
argument and the permutation test will put the samples in random order
within sampling location but allow for different ordering (ST1 then ST2,
or ST2 then ST1) within each location during each permutation. To my
mind this does capture the proposed Null; ST1 and ST2 are exchangeable
as we are testing the hypothesis that they are from the same population,
but samples are not exchangeable between sampling locations as we have
repeated measures.

 If you do go with a multivariate approach I might try a procrustes 
 analysis of PCO ordinations.

That is a good suggestion, Dave. I would add that this will test the
similarity of the configuration of points in ordination space. adonis()
will test if the two sampling methods are drawn from populations with
the same mean species composition. These are two different aspects of
the problem. I would probably try both i) adonis() and ii) separate
ordinations of the ST1 and ST2 species matrices followed by a procrustes
rotation of the two configurations - see ?procrustes for the latter.
both need the `strata` argument supplying to get the correct NULL.

 Dave

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate ANOVA/repeated measures

2011-10-10 Thread Dave Roberts



On 10/07/2011 08:51 AM, Dr N.A. Cutler wrote:

Dear All,

I have a query about multivariate analysis of community data.

In my experiment, 24 microbial communities in different locations were
sampled using Sampling Technique 1 (ST1). A site X species matrix was
then derived by molecular analysis.

The same 24 locations were then sampled again using a different sampling
technique (ST2) and a second site X species matrix was derived. It is
assumed that community structure remains intact after sampling by
Technique 1 i.e. the two techniques can sample from the same pool of
organisms.

I want to compare the results of the two sampling exercises in order to
test the performance of the two sampling techniques. My research
question is: does Technique 1 produce a similar signal to Technique 2?
Or do the different techniques give significantly different pictures of
community structure? The null hypotheses is that there is no significant
difference between the two sampling techniques i.e. they both capture
community structure with the same degree of accuracy.

It occurred to be that I could use a multivariate ANOVA technique (e.g.
Adonis) to distinguish between the results of the two sampling
exercises, using sampling technique as a factor. But I am not sure how
to deal with the obvious correlation between sample pairs. Should this
situation be addressed as a repeated measures experiment with two time
steps? If so, what is the best technique to use (a mixed model, perhaps?)

Any advice would be gratefully received.

Best wishes,

Nick Cutler

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Nick,

   I would try something pretty direct.  Any appeal to differences in 
dissimilarities confounds the effects with the particular 
dissimilarity/distance matrix you use.  Assuming the samples and species 
are in the same order, and that the data.frames are the same size, you 
might try


 actual - sum((ST1-ST2)^2)

and then permute one of the two matrices numerous times

res - rep(NA,999)
for (i in 1:999) {
 res[i] - sum((ST1-ST2[sample(1:nrow(ST2),replace=FALSE),])^2)
}
final - (sum(res = actual) + 1)/1000

and see what fraction of the permuted matrices are as similar.

Hopefully Gavin will weigh in with a better randomization.

   If you do go with a multivariate approach I might try a procrustes 
analysis of PCO ordinations.


Dave
--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate ANOVA/repeated measures

2011-10-10 Thread Rich Shepard

On Mon, 10 Oct 2011, Dave Roberts wrote:


I want to compare the results of the two sampling exercises in order to
test the performance of the two sampling techniques.



  I would try something pretty direct. Any appeal to differences in
dissimilarities confounds the effects with the particular
dissimilarity/distance matrix you use. Assuming the samples and species
are in the same order, and that the data.frames are the same size, you
might try


  I did not read the original message, so I hope you'll allow me to join the
thread. My recommendation is to use univariate tree models, particularly a
classification tree (for ordinal explanatory variables; i.e., ST1 and ST2).

  This is fully, carefully, and non-technically explained in Chapter 9
(particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith Analysing
Ecological Data. For that matter, I highly recommend reading the whole
book.

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate ANOVA/repeated measures

2011-10-10 Thread Gavin Simpson
On Mon, 2011-10-10 at 09:11 -0700, Rich Shepard wrote:
 On Mon, 10 Oct 2011, Dave Roberts wrote:
 
  I want to compare the results of the two sampling exercises in order to
  test the performance of the two sampling techniques.
 
I would try something pretty direct. Any appeal to differences in
  dissimilarities confounds the effects with the particular
  dissimilarity/distance matrix you use. Assuming the samples and species
  are in the same order, and that the data.frames are the same size, you
  might try
 
I did not read the original message, so I hope you'll allow me to join the
 thread. My recommendation is to use univariate tree models, particularly a
 classification tree (for ordinal explanatory variables; i.e., ST1 and ST2).

But the response here is *multivariate* - of course, one could use Glen
De'Ath's multivariate regression trees (despite the name it is really a
constrained clustering/classification) - but I think there are better
ways of solving this particular problem. And unless one has many 100s of
observations, the model will need some sort of variance reduction
applied (via bagging, or some such) as the one fitted model is
potentially highly unstable.

G

This is fully, carefully, and non-technically explained in Chapter 9
 (particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith Analysing
 Ecological Data. For that matter, I highly recommend reading the whole
 book.
 
 Rich
 
 ___
 R-sig-ecology mailing list
 R-sig-ecology@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate ANOVA/repeated measures

2011-10-10 Thread Dave Roberts



On 10/10/2011 02:15 PM, Gavin Simpson wrote:

On Mon, 2011-10-10 at 09:11 -0700, Rich Shepard wrote:

On Mon, 10 Oct 2011, Dave Roberts wrote:


I want to compare the results of the two sampling exercises in order to
test the performance of the two sampling techniques.



   I would try something pretty direct. Any appeal to differences in
dissimilarities confounds the effects with the particular
dissimilarity/distance matrix you use. Assuming the samples and species
are in the same order, and that the data.frames are the same size, you
might try


I did not read the original message, so I hope you'll allow me to join the
thread. My recommendation is to use univariate tree models, particularly a
classification tree (for ordinal explanatory variables; i.e., ST1 and ST2).


But the response here is *multivariate* - of course, one could use Glen
De'Ath's multivariate regression trees (despite the name it is really a
constrained clustering/classification) - but I think there are better
ways of solving this particular problem. And unless one has many 100s of
observations, the model will need some sort of variance reduction
applied (via bagging, or some such) as the one fitted model is
potentially highly unstable.

G


This is fully, carefully, and non-technically explained in Chapter 9
(particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith Analysing
Ecological Data. For that matter, I highly recommend reading the whole
book.

Rich

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




It would be fairly simple to boil down to a univariate question.  You 
could do something as simple as a paired t-test of plot-level species 
richness or the number of individuals sampled (to compare sampling 
efficiency), but I still don't see an independent and a dependent variable.


Dave
--

David W. Roberts office 406-994-4548
Professor and Head  FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Multivariate ANOVA/repeated measures

2011-10-10 Thread Gavin Simpson
On Mon, 2011-10-10 at 15:32 -0600, Dave Roberts wrote:
 
 On 10/10/2011 02:15 PM, Gavin Simpson wrote:
  On Mon, 2011-10-10 at 09:11 -0700, Rich Shepard wrote:
  On Mon, 10 Oct 2011, Dave Roberts wrote:
 
  I want to compare the results of the two sampling exercises in order to
  test the performance of the two sampling techniques.
 
 I would try something pretty direct. Any appeal to differences in
  dissimilarities confounds the effects with the particular
  dissimilarity/distance matrix you use. Assuming the samples and species
  are in the same order, and that the data.frames are the same size, you
  might try
 
  I did not read the original message, so I hope you'll allow me to join 
  the
  thread. My recommendation is to use univariate tree models, particularly a
  classification tree (for ordinal explanatory variables; i.e., ST1 and ST2).
 
  But the response here is *multivariate* - of course, one could use Glen
  De'Ath's multivariate regression trees (despite the name it is really a
  constrained clustering/classification) - but I think there are better
  ways of solving this particular problem. And unless one has many 100s of
  observations, the model will need some sort of variance reduction
  applied (via bagging, or some such) as the one fitted model is
  potentially highly unstable.
 
  G
 
  This is fully, carefully, and non-technically explained in Chapter 9
  (particularly Sections 9.3 and 9.4) in Zuur, Ieno, and Smith Analysing
  Ecological Data. For that matter, I highly recommend reading the whole
  book.
 
  Rich
 
  ___
  R-sig-ecology mailing list
  R-sig-ecology@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
 
 
 It would be fairly simple to boil down to a univariate question.  You 
 could do something as simple as a paired t-test of plot-level species 
 richness or the number of individuals sampled (to compare sampling 
 efficiency), but I still don't see an independent and a dependent variable.
 
 Dave

Indeed, but species richness is quite a different comparison of the
two sampling strategies. The original Q was quite broad so might be what
the OP wants.

The predictor would be the sampling strategy I guess - can you
explain/cluster the data on the basis of sampling strategy?

Compositionally, adonis() would seem to be an appropriate technique here
- it would be effectively a multivariate t-test - and would test if the
multivariate centroids of the species compositions in the two samples
are similar or not (difference of centroids is 0). Of course, if that
Null is rejected, you must test if the difference in centroids is due to
a difference in multivariate dispersions (the spread of the points about
the centroid), which can be done via betadisper().

The issue I'm still thinking about is the permutation - at first look,
your randomisation seems appropriate, especially if the two sampling
strategies can be considered random sampling methods - i.e. all sites
have equal chance of being selected, /and/ that there is no underlying
clustering in the population that should be respected.

The Null hypothesis seems to be that ST1 and ST2 are just two random
samples of some population of species composition/samples that would
arise if you did lots of sampling using ST2.

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology