Hello All: This post was motivated by the earlier posts this week regarding CCA/NMDS/RDA etc and dissimilarity measures. I have often thought that the usual thinking on double zeros for species abundance/composition comparisons across sites has confused several issues and seems driven by an unrealistic expectation that we can have a useful dissimilarity/similarity measure to look across multiple species at multiple sites that will simultaneously provide useful information on absolute abundance, composition (relative abundance) and presence/absence. I developed the examples below several years ago to demonstrate to myself that Euclidean distance can always be made to perform usefully as a dissimilarity/similarity measure with double zero data and that the only issue is whether you want a dissimilarity/similarity measure that is sensitive to absolute abundance, relative abundance (composition), or presence/absence. I believe it is unrealistic to think you can have a dissimilarity/similarity measure that adequately compares two or all three attributes simultaneously. I post these examples just to see what others think about this.
Example problem with interpretation of double 0's from Legendre and Legendre 1998:278); 3 sites (rows) with 3 species (col); done first as absolute abundances; then as relative abundances(composition); both using Euclidean distances. > x [,1] [,2] [,3] [1,] 0 1 1 [2,] 1 0 0 [3,] 0 4 4 > > dist(x) 1 2 2 1.732051 3 4.242641 5.744563 Legendre and Legendre (1998) and others argue that site 1 should be closer to site 3 than site 2 because sites 1 and 3 share two species (2 and 3), whereas sites 1 and 2 share no species. Hidden in this statement about "sharing" really is a compositional type interpretation, i.e., relative abundance. So convert the species abundances to relative proportions (compositions),and then using Euclidean distance provides distance measures that indicate similarity between sites 1 and 3 and the same dissimilarity between sites 1 and 2, and 2 and 3. > xc [,1] [,2] [,3] [1,] 0 0.5 0.5 [2,] 1 0.0 0.0 [3,] 0 0.5 0.5 > dist(xc) 1 2 2 1.224745 3 0.000000 1.224745 Notice no change in distance measure (Euclidean) was required, just a change from absolute to relative abundances. I would easily argue that if you were interested in absolute abundances that the Euclidean distance measures done in the first computation above do correctly recognize that site 1 and 2 are closer than sites 1 and 3, which are slightly closer than sites 2 and 3. So I don't believe the use of Euclidean distances is inappropriate for abundance data, just that if you want a compositional type of interpretation (proportion of species shared) that you ought to convert abundances to relative proportions prior to using Euclidean distances. Similar arguments could be made for the data below from McCune and Grace(2002) box 6.2, where sites 1 and 2 and sites 1 and 3 differences were compared with Euclidean versus city-block distance. City-block distance say the differences between sites 1 and 2 and between 1 and 3 are the same (12), whereas the Euclidean distances are 9.165 versus 6.0. Note that converting to relative abundances yields Euclidean distances of 0.574 and 0.267, which seems reasonable in a compositional type of interpretation. The city-block distance seems to be emphasizing a compositional interpretation based on presence/absence, proportion of species with any abundance>0. > x2 [,1] [,2] [,3] [,4] [1,] 4 2 0 1 [2,] 5 1 1 10 [3,] 7 5 3 4 > dist(x2) 1 2 2 9.165151 3 6.000000 7.745967 ###Convert to relative proportions > x2c<-matrix(c(4/7,5/17,7/19,2/7,1/17,5/19,0,1/17,3/19,1/7,10/17,4/19),nrow=3, ncol=4) > x2c [,1] [,2] [,3] [,4] [1,] 0.5714286 0.28571429 0.00000000 0.1428571 [2,] 0.2941176 0.05882353 0.05882353 0.5882353 [3,] 0.3684211 0.26315789 0.15789474 0.2105263 > dist(x2c) 1 2 2 0.5746326 3 0.2668908 0.4469370 So we can convert the species abundances to presence/absence (1/0) data form and use Euclidean distances. This now has an intepretation comparable to the city-block distance, i.e., sites 1 and 2 and sites 1 and 3 differ by the same amount and sites 2 and 3 don't differ. So really no need to change to a distance measure other than Euclidean, we just need to have the species measures either in absolute abundances, relative abundances (compositions), or presence/absence depending on the desired interpretations. Can different distance measures really be expected to provide simultaneous interpretations in both absolute abundance, relative abundance, and presence/absence measures? I don't think so. > x2d [,1] [,2] [,3] [,4] [1,] 1 1 0 1 [2,] 1 1 1 1 [3,] 1 1 1 1 > dist(x2d) 1 2 2 1 3 1 0 > Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: ca...@usgs.gov <brian_c...@usgs.gov> tel: 970 226-9326 > > [[alternative HTML version deleted]] _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology